Skip to main content
Erschienen in:
Buchtitelbild

Open Access 2019 | OriginalPaper | Buchkapitel

Automatic Verification of C and Java Programs: SV-COMP 2019

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This report describes the 2019 Competition on Software Verification (SV-COMP), the 8\(^{\text {th}}\) edition of a series of comparative evaluations of fully automatic software verifiers for C programs, and now also for Java programs. The competition provides a snapshot of the current state of the art in the area, and has a strong focus on replicability of its results. The repository of benchmark verification tasks now supports a new, more flexible format for task definitions (based on YAML), which was a precondition for conveniently benchmarking Java programs in the same controlled competition setting that was successfully applied in the previous years. The competition was based on 10 522 verification tasks for C programs and 368 verification tasks for Java programs. Each verification task consisted of a program and a property (reachability, memory safety, overflows, termination). SV-COMP 2019 had 31 participating verification systems from 14 countries.

1 Introduction

Software verification is an increasingly important research area, and the annual Competition on Software Verification (SV-COMP)1 is the showcase of the state of the art in the area, in particular, of the effectiveness and efficiency that is currently achieved by tool implementations of the most recent ideas, concepts, and algorithms for fully automatic verification. Every year, the SV-COMP project consists of two parts: (1) The collection of verification tasks and their partitioning into categories has to take place before the actual experiments start, and requires quality-assurance work on the source code in order to ensure a high-quality evaluation. It is important that the SV-COMP verification tasks reflect what the research and development community considers interesting and challenging for evaluating the effectivity (soundness and completeness) and efficiency (performance) of state-of-the-art verification tools. (2) The actual experiments of the comparative evaluation of the relevant tool implementations is performed by the organizer of SV-COMP. Since SV-COMP shall stimulate and showcase new technology, it is necessary to explore and define standards for a reliable and reproducible execution of such a competition: we use https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figa_HTML.gif  [19], a modern framework for reliable benchmarking and resource measurement, to run the experiments, and verification witnesses [14, 15] to validate the verification results.
As for every edition, this SV-COMP report describes the (updated) rules and definitions, presents the competition results, and discusses other interesting facts about the execution of the competition experiments. Also, we need to measure the success of SV-COMP by evaluating whether the main objectives of the competition are achieved (cf. [10]):
1.
provide an overview of the state of the art in software-verification technology and increase visibility of the most recent software verifiers,
 
2.
establish a repository of software-verification tasks that is publicly available for free use as standard benchmark suite for evaluating verification software,
 
3.
establish standards that make it possible to compare different verification tools, including a property language and formats for the results, and
 
4.
accelerate the transfer of new verification technology to industrial practice.
 
As for (1), there were 31 participating software systems from 14 countries, representing a broad spectrum of technologies (cf. Table 5). SV-COMP is considered an important event in the research community, and increasingly also in industry. As for (2), the total set of verification tasks written in C increased in size from 8 908 tasks to 10 522 tasks from 2017 to 2019, and in addition, 368 tasks written in Java were added for 2019. Still, SV-COMP has an ongoing focus on collecting and constructing verification tasks to ensure even more diversity, as witnessed by the issue tracker2 and by the pull requests3 in the GitHub project. As for (3), the largest step forward was to establish a exchangeable standard format for verification witnesses. This means that verification results are fully counted only if they can be independently validated. As for (4), we continuously receive positive feedback from industry. Colleagues from industry reported to us that they observe SV-COMP in order to know about the newest and best available verification tools. Moreover, since SV-COMP 2017 there are also a few participating systems from industry.
Related Competitions. It is well-understood that competitions are an important evaluation method, and there are other competitions in the field of software verification: RERS4 [40] and VerifyThis5 [41]. While SV-COMP performs replicable experiments in a controlled environment (dedicated resources, resource limits), the RERS Challenges give more room for exploring combinations of interactive with automatic approaches without limits on the resources, and the VerifyThis Competition focuses on evaluating approaches and ideas rather than on fully automatic verification. The termination competition termCOMP6 [33] concentrates on termination but considers a broader range of systems, including logic and functional programs. This year, SV-COMP is part of TOOLympics [6]. A more comprehensive list of other competitions is provided in the report on SV-COMP 2014 [9]. There are other large benchmark collections as well (e.g., by SPEC7), but the sv-benchmark suite8 is (a) free of charge and (b) tailored to the state of the art in software verification. There is a certain flow of benchmark sets between benchmark repositories: For example, the sv-benchmark suite contains programs that were used in RERS9 or in termCOMP10 before.

2 Procedure

The overall competition organization did not change in comparison to the past editions [712]. SV-COMP is an open competition, where all verification tasks are known before the submission of the participating verifiers, which is necessary due to the complexity of the C language. During the benchmark submission phase, new verification tasks were collected, classified, and added to the existing benchmark suite (i.e., SV-COMP uses an accumulating benchmark suite), during the training phase, the teams inspected the verification tasks and trained their verifiers (also, the verification tasks received fixes and quality improvement), and during the evaluation phase, verification runs were preformed with all competition candidates, and the system descriptions were reviewed by the competition jury. The participants received the results of their verifier directly via e-mail, and after a few days of inspection, the results were publicly announced on the competition web site. The Competition Jury consisted again of the chair and one member of each participating team. Team representatives of the jury are listed in Table 4.

3 Definitions, Formats, and Rules

License Requirements. Starting 2018, SV-COMP required that the verifier must be publicly available for download and has a license that
(i)
allows replication and evaluation by anybody (including results publication),
 
(ii)
does not restrict the usage of the verifier output (log files, witnesses), and
 
(iii)
allows any kind of (re-)distribution of the unmodified verifier archive.
 
Verification Tasks. The definition of verification tasks was not changed and we refer to previous reports for more details [9, 12]. The validation of the results based on verification witnesses [14, 15] was done exactly as in the previous years (2017, 2018), mandatory for both answers True or False; the only change was that an additional new, execution-based witness validator [16] was used. A few categories were excluded from validation if the validators did not sufficiently support a certain kind of program or property.
Categories, Properties, Scoring Schema, and Ranking. The categories are listed in Tables 6 and 7 and described in detail on the competition web site.11
Table 1.
Properties used in SV-COMP 2019 (G valid-memcleanup is new)
Formula
Interpretation
G ! call(foo())
A call to function foo is not reachable on any finite execution.
G valid-free
All memory deallocations are valid (counterexample: invalid free).
More precisely: There exists no finite execution of the program on which an invalid memory deallocation occurs.
G valid-deref
All pointer dereferences are valid (counterexample: invalid dereference). More precisely: There exists no finite execution of the program on which an invalid pointer dereference occurs.
G valid-memtrack
All allocated memory is tracked, i.e., pointed to or deallocated (counterexample: memory leak). More precisely: There exists no finite execution of the program on which the program lost track of some previously allocated memory.
G valid-memcleanup
All allocated memory is deallocated before the program terminates. In addition to valid-memtrack: There exists no finite execution of the program on which the program terminates but still points to allocated memory. (Comparison to Valgrind: This property can be violated even if Valgrind reports ‘still reachable’.)
F end
All program executions are finite and end on proposition end, which marks all program exits (counterexample: infinite loop).
More precisely: There exists no execution of the program on which the program never terminates.
Table 2.
Scoring schema for SV-COMP 2019 (unchanged since 2017 [12])
Reported result
Points
Description
Unknown
0
Failure to compute verification result
False correct
\(+1\)
Violation of property in program was correctly found and a validator confirmed the result based on a witness
False incorrect
\(-16\)
Violation reported but property holds (false alarm)
True correct
\(+2\)
Program correctly reported to satisfy property and a validator confirmed the result based on a witness
True correct unconfirmed
\(+1\)
Program correctly reported to satisfy property, but the witness was not confirmed by a validator
True incorrect
\(-32\)
Incorrect program reported as correct (wrong proof)
Figure 1 shows the category composition. For the definition of the properties and the property format, we refer to the 2015 competition report [10]. All specifications are available in the directory c/properties/ of the benchmark repository. Table 1 lists the properties and their syntactical representation as overview. Property G valid-memcleanup was used for the first time in SV-COMP 2019.
The scoring schema is identical for SV-COMP 2017–2019: Table 2 provides the overview and Fig. 2 visually illustrates the score assignment for one property. The scoring schema still contains the special rule for unconfirmed correct results for expected result True that was introduced in the transitioning phase: one point is assigned if the answer matches the expected result but the witness was not confirmed. Starting with SV-COMP 2020, the single-point rule will be dropped, i.e., points are only assigned if the results got validated or no validator was available.
The ranking was again decided based on the sum of points (normalized for meta categories) and for equal sum of points according to success run time, which is the total CPU time over all verification tasks for which the verifier reported a correct verification result. Opt-out from Categories and Score Normalization for Meta Categories was done as described previously [8] (page 597).

4 New Format for Task Definitions

Technically, we need a verification task (a pair of a program and a specification to verify) to feed as input to the verifier, and an expected result against which we check the answer that the verifier returns. We changed the format of how these tasks are specified for SV-COMP 2019: The C track is still based on the old format, while the Java track already uses the new format.
Recap: Old Format. Previously, the above-mentioned three components were specified in the file name of the program. For example, consider the file name c/​ntdrivers/​floppy_​true-unreach-call_​true-valid-memsafety.​i.​cil.​c, which encodes the program, the specification (consisting of two properties), and two expected results (one for each property) in the following way:
This format was used for eight years of SV-COMP, because it is easy to understand and use. However, whenever a new property should be added to the specification of a given program, the program’s file name needs to be changed, which has negative impact on traceability and maintenance. From SV-COMP 2020 onwards, the repository will use the following new format for all tracks.
Explicit Task-Definition Files. All the above-discussed information is stored in an extra file that contains a structured definition of the verification tasks for a program. For each program, the repository contains the program file and a task-definition file. The above program is available under the name floppy.​i.​cil-3.​c and comes with its task-definition file floppy.​i.​cil-3.​yml. Figure 3 shows this task definition.
The task definition uses the YAML format as underlying structured data format. It contains a version id of the format (line 1) and can contain comments (line 3). The field input_files specifies the input program (example: ‘floppy.​i.​cil-3.​c’), which is either one file or a list of files. The field properties lists all properties of the specification for this program. Each property has a field property_file that specifies the property file (example: .​.​/​properties/​unreach-call.​prp) and a field expected_verdict that specifies the expected result (example: true).

5 Including Java Programs

The first seven editions of SV-COMP considered only programs written in C. In 2019, the competition was extended to include a Java track. Some of the Java programs existed already in the repository, and many other Java programs were contributed by the community [29]. Currently, most of the programs are from the regression-test suites from the verifiers that participate in the Java track; the goal is to substantially increase the benchmark set over the next years.
In principle, the same definitions and rules as for the C track apply, but some technical details need to be slightly adapted for Java programs. Most prominently, the classes of a Java program cannot be inlined into one Java file, which is solved by using the new task-definition format, which allows lists of input files. This required an extension of https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figb_HTML.gif that is present in version 1.1712 and higher.
The property for reachability is also slightly different, as shown in Fig. 4: The function call to the start of the program is Main.main() instead of main(), and the verifiers check that proposition assert is always true, instead of checking that https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/483146_1_En_9_IEq7_HTML.gif is never called. The new proposition assert is false where a Java assert statement fails, i.e., the exception AssertionError is thrown.
The rules for the C track specify a function https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/483146_1_En_9_IEq8_HTML.gif for each type X from the set {bool, char, int, float, double, loff_t, long, pchar, pointer, pthread_t, sector_t, short, size_t, u32, uchar, uint, ulong, unsigned, ushort} (no side effects, pointer for void *, etc.) that all return an arbitrary, nondeterministic value (‘input’ value) of the respective type that may differ for different invocations. Similarly for the Java track: we use a Java class named org.​sosy_​lab.​sv_​benchmarks.​Verifier with the following parameter-less static methods: nondetBoolean, nondetByte, nondetChar, nondetShort, nondetInt, nondetLong, nondetFloat, nondetDouble, and nondetString. Each of those methods creates a value of the respective type using functionality from java.util.Random. The earlier proposal [29] to use the array of arguments that is passed to the main method to obtain nondeterministic values was not followed. The SV-COMP community found that the explicitly defined methods are better for the competition, and also closer to practice.
Finally, the static method assume(boolean) in the same class can be used to assume a certain value range. The implementation halts using Runtime.getRuntime().halt(1). It was proposed [29] to omit this method but in the end the community decided to include it.

6 Reproducibility

It is important that all SV-COMP experiments can be independently replicated, and that the results can be reproduced. Therefore, all major components that are used for the competition need to be publicly available. Figure 5 gives an overview over the components that contribute to the reproducible setup of SV-COMP, and Table 3 provides the details. We refer to a previous report [11] for a description of all components of the SV-COMP organization and how it is ensured that all parts are publicly available for maximal replicability.
Table 3.
Publicly available components for replicating SV-COMP 2019
Component
Fig. 5
Repository
Version
Verification Tasks
(a)
svcomp19
Benchmark Definitions 
(b)
svcomp19
Tool-Info Modules
(c)
1.17
Verifier Archives
(d)
svcomp19
Benchmarking
(e)
1.17
Witness Format
(f)
svcomp19
Since SV-COMP 2018, we use a more transparent way of making the verifier archives publicly available. All verifier archives are now stored in a public Git repository. We chose GitLab to host the repository for the verifier archives due to its generous repository size limit of 10 GB (we could not use GitHub, because it has a strict size limit of 100 MB per file, and recommends to keep the repository under 1 GB). An overview table with information about all participating systems is provided in Table 4 and on the competition web site13.
In addition to providing the components to replicate the experiments, SV-COMP also makes the raw results available in the XML-based exchange format in which https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figc_HTML.gif  [20] delivers the data, and also publishes all verification witnesses [13].
Table 4.
Competition candidates with tool references and representing jury members
Participant
Ref.
Jury member
Affiliation
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figd_HTML.gif
[49, 61]
Peter Schrammel
U. of Sussex, UK
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Fige_HTML.gif
[34, 38]
Jera Hensel
RWTH Aachen, Germany
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figf_HTML.gif
[46]
Michael Tautschnig
Amazon Web Services, UK
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figg_HTML.gif
[44]
Kareem Khazem
U. College London, UK
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figh_HTML.gif
[1, 64]
Vadim Mutilin
ISP RAS, Russia
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figi_HTML.gif
[2]
Pavel Andrianov
ISP RAS, Russia
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figj_HTML.gif
[18, 30]
Marie-Christine Jakobs
LMU Munich, Germany
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figk_HTML.gif
[58, 60]
Omar Alhawi
U. of Manchester, UK
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figl_HTML.gif
[5, 62]
Vladimír Štill
Masaryk U., Czechia
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figm_HTML.gif
[47, 48]
Henrich Lauko
Masaryk U., Czechia
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Fign_HTML.gif
[31, 32]
Mikhail R. Gadelha
U. of Southampton, UK
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figo_HTML.gif
[42, 43]
Philipp Rümmer
Uppsala U., Sweden
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figp_HTML.gif
[27, 28]
Lucas Cordeiro
U. of Manchester, UK
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figq_HTML.gif
[3, 63]
Cyrille Artho
KTH, Sweden
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figr_HTML.gif
[50]
Omar Inverso
Gran Sasso Science Inst., Italy
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figs_HTML.gif
[57, 59]
Herbert Rocha
Federal U. of Roraima, Brazil
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figt_HTML.gif
[56]
Cedric Richter
U. of Paderborn, Germany
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figu_HTML.gif
[24]
Eti Chaudhary
IIT Hyderabad, India
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figv_HTML.gif
[39, 45]
Veronika Šoková
BUT, Brno, Czechia
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figw_HTML.gif
[21]
Franck Cassez
Macquarie U., Australia
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figx_HTML.gif
[36, 55]
Zvonimir Rakamaric
U. of Utah, USA
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figy_HTML.gif
[51, 53]
Willem Visser
Stellenbosch U., South Africa
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figz_HTML.gif
[22, 23]
Marek Chalupa
Masaryk U., Czechia
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figaa_HTML.gif
[37]
Matthias Heizmann
U. of Freiburg, Germany
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figab_HTML.gif
[52]
Alexander Nutz
U. of Freiburg, Germany
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figac_HTML.gif
[35]
Daniel Dietsch
U. of Freiburg, Germany
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figad_HTML.gif
[25]
Priyanka Darke
Tata Consultancy Services, India
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figae_HTML.gif
[26]
R. K. Medicherla
Tata Consultancy Services, India
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figaf_HTML.gif
[54]
Pritom Rajkhowa
Hong Kong UST, China
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figag_HTML.gif
[65, 66]
Liangze Yin
Nat. U. of Defense Techn., China
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figah_HTML.gif
[67]
Haining Feng
Nat. U. of Defense Techn., China

7 Results and Discussion

For the eighth time, the competition experiments represent the state of the art in fully automatic software-verification tools. The report shows the improvements compared to last year, in terms of effectiveness (number of verification tasks that can be solved, correctness of the results, as accumulated in the score) and efficiency (resource consumption in terms of CPU time). The results that are presented in this article were inspected and approved by the participating teams.
Table 5.
Technologies and features that the competition candidates offer
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Tab5_HTML.png
Participating Verifiers. Table 4 provides an overview of the participating verification systems and Table 5 lists the features and technologies that are used in the verification tools.
Computing Resources. The resource limits were the same as in the previous competitions [11]: Each verification run was limited to 8 processing units (cores), 15 GB of memory, and 15 min of CPU time. The witness validation was limited to 2 processing units, 7 GB of memory, and 1.5 min of CPU time for violation witnesses and 15 min of CPU time for correctness witnesses. The machines for running the experiments are part of a compute cluster that consists of 168 machines; each verification run was executed on an otherwise completely unloaded, dedicated machine, in order to achieve precise measurements. Each machine had one Intel Xeon E3-1230 v5 CPU, with 8 processing units each, a frequency of 3.4 GHz, 33 GB of RAM, and a GNU/Linux operating system (x86_64-linux, Ubuntu 18.04 with Linux kernel 4.15). We used https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figai_HTML.gif  [19] to measure and control computing resources (CPU time, memory, CPU energy) and VerifierCloud14 to distribute, install, run, and clean-up verification runs, and to collect the results.
One complete verification execution of the competition consisted of 418 benchmarks (each verifier on each selected category according to the opt-outs), summing up to 178 674 verification runs. The total consumed CPU time was 461 days for one complete competition run for verification (without validation). Witness-based result validation required 2 645 benchmarks (combinations of verifier, category with witness validation, and a set of validators) summing up to 517 175 validation runs. Each tool was executed several times, in order to make sure no installation issues occur during the execution. Including pre-runs, the infrastructure managed a total of 5 880 071 runs and consumed 15 years and 182 days of CPU time.
Table 6.
Quantitative overview over all results; empty cells mark opt-outs
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Tab6_HTML.png
Table 7.
Overview of the top-three verifiers for each category (CPU time in h, rounded to two significant digits)
Rank
Verifier
Score
CPU Time
Solved Tasks
False Alarms
Wrong Proofs
ReachSafety
1
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/483146_1_En_9_IEq9_HTML.gif
4638
110
2 811
 
2
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figaj_HTML.gif
4299
60
2 519
 
3
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figak_HTML.gif
4239
58
2 431
2
 
MemSafety
1
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/483146_1_En_9_IEq10_HTML.gif
426
.030
299
 
2
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figal_HTML.gif
416
.61
296
 
3
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figam_HTML.gif
349
.55
256
 
ConcurrencySafety
1
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/483146_1_En_9_IEq11_HTML.gif
1277
.31
1 026
 
2
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figan_HTML.gif
1245
3.0
1 006
 
3
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figao_HTML.gif
996
13
830
 
NoOverflows
1
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/483146_1_En_9_IEq12_HTML.gif
449
.94
306
 
2
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figap_HTML.gif
438
.96
302
 
3
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figaq_HTML.gif
431
.59
283
 
Termination
1
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/483146_1_En_9_IEq13_HTML.gif
3001
13
1 662
 
2
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figar_HTML.gif
2476
33
1 004
 
3
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figas_HTML.gif
1785
15
1 319
 
SoftwareSystems
1
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/483146_1_En_9_IEq14_HTML.gif
1185
9.1
1 572
 
7
2
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figat_HTML.gif
1073
28
1 447
 
3
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figau_HTML.gif
1061
24
1 407
 
C-FalsificationOverall
1
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/483146_1_En_9_IEq15_HTML.gif
2823
40
2 129
 
2
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figav_HTML.gif
2313
53
2 105
8
 
3
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figaw_HTML.gif
1916
15
1 753
14
 
C-Overall
1
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/483146_1_En_9_IEq16_HTML.gif
9329
120
6 654
 
2
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figax_HTML.gif
8466
120
6 466
8
1
3
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figay_HTML.gif
6727
85
5 454
5
10
Java-Overall
1
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/483146_1_En_9_IEq17_HTML.gif
470
2.7
331
 
2
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figaz_HTML.gif
365
.27
337
4
2
3
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figba_HTML.gif
290
.15
331
 
6
Quantitative Results. Table 6 presents the quantitative overview over all tools and all categories. The head row mentions the category, the maximal score for the category, and the number of verification tasks. The tools are listed in alphabetical order; every table row lists the scores of one verifier. We indicate the top three candidates by formatting their scores in bold face and in larger font size. An empty table cell means that the verifier opted-out from the respective main category (perhaps participating in subcategories only, restricting the evaluation to a specific topic). More information (including interactive tables, quantile plots for every category, and also the raw data in XML format) is available on the competition web site.15
Table 7 reports the top three verifiers for each category. The run time (column ‘CPU Time’) refers to successfully solved verification tasks (column ‘Solved Tasks’). The columns ‘False Alarms’ and ‘Wrong Proofs’ report the number of verification tasks for which the verifier reported wrong results: reporting an error path when the property holds (incorrect False) and claiming that the program fulfills the property although it actually contains a bug (incorrect True), respectively.
Discussion of Scoring Schema. The verification community considers computation of correctness proofs to be more difficult than computing error paths: according to Table 2, an answer True yields 2 points (confirmed witness) and 1 point (unconfirmed witness), while an answer False yields 1 point (confirmed witness). This can have consequences for the final ranking, as discussed in the report of SV-COMP 2016 [11]. The data from SV-COMP 2019 draw a different picture.
Table 8.
Necessary effort to compute results False versus True (measurement values rounded to two significant digits)
Result
True
False
CPU Time (s)
CPU Energy (J)
CPU Time (s)
CPU Energy (J)
mean
median
mean
median
mean
median
mean
median
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figbb_HTML.gif
67
9.5
690
82
58
14
560
120
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figbc_HTML.gif
56
19
540
160
77
26
680
220
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figbd_HTML.gif
56
17
540
140
58
19
570
180
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figbe_HTML.gif
4.8
.25
57
2.9
19
.45
210
5.5
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figbf_HTML.gif
8.6
.20
91
2.3
21
.24
180
2.8
Table 8 shows the mean and median values for resource consumption regarding CPU time and energy consumption: the first column lists the five best verifiers of category C-Overall, the second to fifth columns report the CPU time and CPU energy (mean and median) for results True, and the sixth to ninth columns for results False. The mean and median are taken over successfully solved verification tasks; the CPU time is reported in seconds and the CPU energy in joule ( https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figbg_HTML.gif reads and accumulates the energy measurements of Intel CPUs using the tool CPU Energy Meter16).
Score-Based Quantile Functions for Quality Assessment. We use score-based quantile functions [8] because these visualizations make it easier to understand the results of the comparative evaluation. The web site\(^{15}\) includes such a plot for each category; as example, we show the plot for category C-Overall (all verification tasks) in Fig. 6. A total of 13 verifiers participated in category C-Overall, for which the quantile plot shows the overall performance over all categories (scores for meta categories are normalized [8]). A more detailed discussion of score-based quantile plots, including examples of what insights one can obtain from the plots, is provided in previous competition reports [8, 11].
Table 9.
Alternative rankings; quality is given in score points (sp), CPU time in hours (h), energy in kilojoule (kJ), wrong results in errors (E), rank measures in errors per score point (E/sp), joule per score point (J/sp), and score points (sp)
Rank
Verifier
Quality (sp)
CPU Time (h)
CPU Energy (kJ)
Solved Tasks
Wrong Results (E)
Rank Measure
Correct Verifiers
(E/sp)
1
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/483146_1_En_9_IEq19_HTML.gif
9 329
120
4 300
2 811
0
.0000
2
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figbh_HTML.gif
6 129
9.7
390
2 519
0
.0000
3
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figbi_HTML.gif
8 466
120
3 900
2 431
9
.0011
worst
.3836
Green Verifiers
(J/sp)
1
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/483146_1_En_9_IEq20_HTML.gif
6 129
9.7
390
299
0
64
2
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figbj_HTML.gif
4 341
11
380
296
14
88
3
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figbk_HTML.gif
1 547
4.4
180
256
10
120
worst
4 200
New Verifiers
(sp)
1
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/483146_1_En_9_IEq21_HTML.gif
8 466
120
3 900
1 026
9
8 466
2
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figbl_HTML.gif
1 587
8.9
380
1 006
69
1 587
Alternative Rankings. The community suggested to report a couple of alternative rankings that honor different aspects of the verification process as complement to the official SV-COMP ranking. Table  9 is similar to Table 7, but contains the alternative ranking categories Correct, Green, and New Verifiers. Column ‘Quality’ gives the score in score points, column ‘CPU Time’ the CPU usage of successful runs in hours, column ‘CPU Energy’ the CPU usage of successful runs in kilojoule, column ‘Solved Tasks’ the number of correct results, column ‘Wrong results’ the sum of false alarms and wrong proofs in number of errors, and column ‘Rank Measure’ gives the measure to determine the alternative rank.
Correct Verifiers—Low Failure Rate. The right-most columns of Table 7 report that the verifiers achieve a high degree of correctness (all top three verifiers in the C track have less than 1% wrong results). The winners of category C-Overall and Java-Overall produced not a single wrong answer.
The first category in Table  9 uses a failure rate as rank measure: \(\frac{\textsf {number\,of\,incorrect\,results}}{\textsf {total\,score}}\), the number of errors per score point (\(E{\slash }sp\)). We use E as unit for \(\textsf {number\,of\,incorrect\,results}\) and sp as unit for \(\textsf {total\,score}\). The total score is used as tie-breaker to distinguish the rank of error-free verifiers.
Green Verifiers—Low Energy Consumption. Since a large part of the cost of verification is given by the energy consumption, it might be important to also consider the energy efficiency. The second category in Table  9 uses the energy consumption per score point as rank measure: \(\frac{\textsf {total\,CPU\,energy}}{\textsf {total\,score}}\), with the unit \(J{\slash }sp\).
Table 10.
Confirmation rate of verification witnesses in SV-COMP 2019
Result
True
False
Total
Confirmed
Unconf.
Total
Confirmed
Unconf.
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figbm_HTML.gif
4 417
3 968
90 %
449
2 859
2 686
94 %
173
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figbn_HTML.gif
4 176
3 814
91 %
362
2 823
2 652
94 %
171
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figbo_HTML.gif
4 244
4 199
99 %
45
1 523
1 255
82 %
268
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figbp_HTML.gif
2 430
2 381
98 %
49
1 451
1 214
84 %
237
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figbq_HTML.gif
1 813
1 702
94 %
111
1 975
1 248
63 %
727
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figbr_HTML.gif
3 015
2 936
97 %
79
915
653
71 %
262
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figbs_HTML.gif
2 072
2 045
99 %
27
1 419
945
67 %
474
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figbt_HTML.gif
3 679
3 556
97 %
123
2 141
1 753
82 %
388
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figbu_HTML.gif
2 070
2 038
98 %
32
553
548
99 %
5
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figbv_HTML.gif
1 206
1 162
96 %
44
897
670
75 %
727
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figbw_HTML.gif
693
673
97 %
20
768
353
46 %
415
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figbx_HTML.gif
645
626
97 %
19
943
601
64 %
342
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-17502-3_9/MediaObjects/483146_1_En_9_Figby_HTML.gif
612
602
98 %
10
1 938
1 370
71 %
568
New Verifiers—High Quality. To acknowledge the achievements of verifiers that participate for the first time in SV-COMP, the third category in Table  9 uses the quality in score points as rank measure, that is, the official SV-COMP rank measure, but the subject systems reduced to verifiers that participate for the first time. The Java track consists exclusively of new verifiers, so the new-verifiers ranking is the same as the official ranking.
Verifiable Witnesses. For SV-COMP, it is not sufficient to answer with just True or False: each answer should be accompanied by a verification witness. All verifiers in categories that required witness validation support the common exchange format for violation and correctness witnesses. We used four independently developed witness-based result validators [1416].
The majority of witnesses that the verifiers produced can be confirmed by the results-validation process. Interestingly, the confirmation rate for the True results is significantly higher than for the False results. Table 10 shows the confirmed versus unconfirmed results: the first column lists the verifiers of category C-Overall, the three columns for result True reports the total, confirmed, and unconfirmed number of verification tasks for which the verifier answered with True, respectively, and the three columns for result False reports the total, confirmed, and unconfirmed number of verification tasks for which the verifier answered with False, respectively. More information (for all verifiers) is given in the detailed tables on the competition web site\(^{15}\), cf. also the report on the demo category for correctness witnesses from SV-COMP 2016 [11]. Result validation is an important topic also in other competitions (e.g., in the SAT competition [4]).

8 Conclusion

SV-COMP 2019, the 8\(^{\text {th}}\) edition of the Competition on Software Verification, attracted 31 participating teams from 14 countries (see Fig. 7 for the development). SV-COMP continues to offer the broadest overview of the state of the art in automatic software verification. For the first time, the competition included Java verification; this track had four participating verifiers. The competition does not only execute the verifiers and collect results, but also tries to validate the verification results, based on the latest versions of four independently developed results validators. The number of verification tasks was increased to 10 522 in C and to 368 in Java. As before, the large jury and the organizer made sure that the competition follows the high quality standards of the TACAS conference, in particular with respect to the important principles of fairness, community support, and transparency.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Literatur
1.
Zurück zum Zitat Andrianov, P., Friedberger, K., Mandrykin, M.U., Mutilin, V.S., Volkov, A.: CPA-BAM-BnB: Block-abstraction memoization and region-based memory models for predicate abstractions (competition contribution). In: Proc. TACAS, LNCS, vol. 10206, pp. 355–359. Springer, Heidelberg (2017). https://doi.org/10.1007/978-3-662-54580-5_22 Andrianov, P., Friedberger, K., Mandrykin, M.U., Mutilin, V.S., Volkov, A.: CPA-BAM-BnB: Block-abstraction memoization and region-based memory models for predicate abstractions (competition contribution). In: Proc. TACAS, LNCS, vol. 10206, pp. 355–359. Springer, Heidelberg (2017). https://​doi.​org/​10.​1007/​978-3-662-54580-5_​22
4.
Zurück zum Zitat Balyo, T., Heule, M.J.H., Järvisalo, M.: SAT Competition 2016: Recent developments. In: Proc. AI, pp. 5061–5063. AAAI Press (2017) Balyo, T., Heule, M.J.H., Järvisalo, M.: SAT Competition 2016: Recent developments. In: Proc. AI, pp. 5061–5063. AAAI Press (2017)
5.
Zurück zum Zitat Baranová, Z., Barnat, J., Kejstová, K., Kučera, T., Lauko, H., Mrázek, J., Ročkai, P., Štill, V.: Model checking of C and C++ with DIVINE 4. In: Proc. ATVA, LNCS, vol. 10482, pp. 201–207. Springer, Cham (2017) Baranová, Z., Barnat, J., Kejstová, K., Kučera, T., Lauko, H., Mrázek, J., Ročkai, P., Štill, V.: Model checking of C and C++ with DIVINE 4. In: Proc. ATVA, LNCS, vol. 10482, pp. 201–207. Springer, Cham (2017)
6.
Zurück zum Zitat Bartocci, E., Beyer, D., Black, P.E., Fedyukovich, G., Garavel, H., Hartmanns, A., Huisman, M., Kordon, F., Nagele, J., Sighireanu, M., Steffen, B., Suda, M., Sutcliffe, G., Weber, T., Yamada, A.: TOOLympics 2019: An overview of competitions in formal methods. In: Proc. TACAS, Part 3, LNCS, vol. 11429, pp. 3–24. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-17502-3_1 Bartocci, E., Beyer, D., Black, P.E., Fedyukovich, G., Garavel, H., Hartmanns, A., Huisman, M., Kordon, F., Nagele, J., Sighireanu, M., Steffen, B., Suda, M., Sutcliffe, G., Weber, T., Yamada, A.: TOOLympics 2019: An overview of competitions in formal methods. In: Proc. TACAS, Part 3, LNCS, vol. 11429, pp. 3–24. Springer, Cham (2019). https://​doi.​org/​10.​1007/​978-3-030-17502-3_​1
21.
Zurück zum Zitat Cassez, F., Sloane, A.M., Roberts, M., Pigram, M., Suvanpong, P., de Aledo Marugán, P.G.: Skink: Static analysis of programs in LLVM intermediate representation (competition contribution). In: Proc. TACAS, LNCS, vol. 10206, pp. 380–384. Springer, Heidelberg (2017). https://doi.org/10.1007/978-3-662-54580-5_27 Cassez, F., Sloane, A.M., Roberts, M., Pigram, M., Suvanpong, P., de Aledo Marugán, P.G.: Skink: Static analysis of programs in LLVM intermediate representation (competition contribution). In: Proc. TACAS, LNCS, vol. 10206, pp. 380–384. Springer, Heidelberg (2017). https://​doi.​org/​10.​1007/​978-3-662-54580-5_​27
29.
Zurück zum Zitat Cordeiro, L.C., Kröning, D., Schrammel, P.: Benchmarking of Java verification tools at the software verification competition (SV-COMP). CoRR abs/1809.03739 (2018) Cordeiro, L.C., Kröning, D., Schrammel, P.: Benchmarking of Java verification tools at the software verification competition (SV-COMP). CoRR abs/1809.03739 (2018)
34.
Zurück zum Zitat Giesl, J., Aschermann, C., Brockschmidt, M., Emmes, F., Frohn, F., Fuhs, C., Hensel, J., Otto, C., Plücker, M., Schneider-Kamp, P., Ströder, T., Swiderski, S., Thiemann, R.: Analyzing program termination and complexity automatically with aprove. J. Autom. Reason. 58(1), 3–31 (2017)MathSciNetCrossRef Giesl, J., Aschermann, C., Brockschmidt, M., Emmes, F., Frohn, F., Fuhs, C., Hensel, J., Otto, C., Plücker, M., Schneider-Kamp, P., Ströder, T., Swiderski, S., Thiemann, R.: Analyzing program termination and complexity automatically with aprove. J. Autom. Reason. 58(1), 3–31 (2017)MathSciNetCrossRef
35.
Zurück zum Zitat Greitschus, M., Dietsch, D., Heizmann, M., Nutz, A., Schätzle, C., Schilling, C., Schüssele, F., Podelski, A.: Ultimate Taipan: Trace abstraction and abstract interpretation (competition contribution). In: Proc. TACAS, LNCS, vol. 10206, pp. 399–403. Springer, Heidelberg (2017). https://doi.org/10.1007/978-3-662-54580-5_31 Greitschus, M., Dietsch, D., Heizmann, M., Nutz, A., Schätzle, C., Schilling, C., Schüssele, F., Podelski, A.: Ultimate Taipan: Trace abstraction and abstract interpretation (competition contribution). In: Proc. TACAS, LNCS, vol. 10206, pp. 399–403. Springer, Heidelberg (2017). https://​doi.​org/​10.​1007/​978-3-662-54580-5_​31
36.
Zurück zum Zitat Haran, A., Carter, M., Emmi, M., Lal, A., Qadeer, S., Rakamarić, Z.: SMACK+Corral: A modular verifier (competition contribution). In: Proc. TACAS, LNCS, vol. 9035, pp. 451–454. Springer, Heidelberg (2015) Haran, A., Carter, M., Emmi, M., Lal, A., Qadeer, S., Rakamarić, Z.: SMACK+Corral: A modular verifier (competition contribution). In: Proc. TACAS, LNCS, vol. 9035, pp. 451–454. Springer, Heidelberg (2015)
37.
Zurück zum Zitat Heizmann, M., Chen, Y., Dietsch, D., Greitschus, M., Nutz, A., Musa, B., Schätzle, C., Schilling, C., Schüssele, F., Podelski, A.: Ultimate Automizer with an on-demand construction of Floyd-Hoare automata (competition contribution). In: Proc. TACAS, LNCS, vol. 10206, pp. 394–398. Springer, Heidelberg (2017). https://doi.org/10.1007/978-3-662-54580-5_30 Heizmann, M., Chen, Y., Dietsch, D., Greitschus, M., Nutz, A., Musa, B., Schätzle, C., Schilling, C., Schüssele, F., Podelski, A.: Ultimate Automizer with an on-demand construction of Floyd-Hoare automata (competition contribution). In: Proc. TACAS, LNCS, vol. 10206, pp. 394–398. Springer, Heidelberg (2017). https://​doi.​org/​10.​1007/​978-3-662-54580-5_​30
38.
39.
45.
Zurück zum Zitat Kotoun, M., Peringer, P., Soková, V., Vojnar, T.: Optimized Predators and the SV-COMP heap and memory safety benchmark (competition contribution). In: Proc. TACAS, LNCS, vol. 9636, pp. 942–945. Springer, Heidelberg (2016) Kotoun, M., Peringer, P., Soková, V., Vojnar, T.: Optimized Predators and the SV-COMP heap and memory safety benchmark (competition contribution). In: Proc. TACAS, LNCS, vol. 9636, pp. 942–945. Springer, Heidelberg (2016)
46.
Zurück zum Zitat Kröning, D., Tautschnig, M.: Cbmc: C bounded model checker (competition contribution). In: Proc. TACAS, LNCS, vol. 8413, pp. 389–391. Springer, Heidelberg (2014) Kröning, D., Tautschnig, M.: Cbmc: C bounded model checker (competition contribution). In: Proc. TACAS, LNCS, vol. 8413, pp. 389–391. Springer, Heidelberg (2014)
47.
Zurück zum Zitat Lauko, H., Ročkai, P., Barnat, J.: Symbolic computation via program transformation. In: Proc. ICTAC, LNCS, vol. 11187, pp. 313–332. Springer, Cham (2018) Lauko, H., Ročkai, P., Barnat, J.: Symbolic computation via program transformation. In: Proc. ICTAC, LNCS, vol. 11187, pp. 313–332. Springer, Cham (2018)
49.
Zurück zum Zitat Malik, V., Hruska, M., Schrammel, P., Vojnar, T.: 2LS: Heap analysis and memory safety (competition contribution). Tech. Rep. abs/1903.00712, CoRR (2019) Malik, V., Hruska, M., Schrammel, P., Vojnar, T.: 2LS: Heap analysis and memory safety (competition contribution). Tech. Rep. abs/1903.00712, CoRR (2019)
50.
Zurück zum Zitat Nguyen, T.L., Inverso, O., Fischer, B., La Torre, S., Parlato, G.: Lazy-CSeq 2.0: Combining lazy sequentialization with abstract interpretation (competition contribution). In: Proc. TACAS, LNCS, vol. 10206, pp. 375–379. Springer, Heidelberg (2017). https://doi.org/10.1007/978-3-662-54580-5_26 Nguyen, T.L., Inverso, O., Fischer, B., La Torre, S., Parlato, G.: Lazy-CSeq 2.0: Combining lazy sequentialization with abstract interpretation (competition contribution). In: Proc. TACAS, LNCS, vol. 10206, pp. 375–379. Springer, Heidelberg (2017). https://​doi.​org/​10.​1007/​978-3-662-54580-5_​26
53.
Zurück zum Zitat Pasareanu, C.S., Visser, W., Bushnell, D.H., Geldenhuys, J., Mehlitz, P.C., Rungta, N.: Symbolic PathFinder: integrating symbolic execution with model checking for Java bytecode analysis. Autom. Softw. Eng. 20(3), 391–425 (2013)CrossRef Pasareanu, C.S., Visser, W., Bushnell, D.H., Geldenhuys, J., Mehlitz, P.C., Rungta, N.: Symbolic PathFinder: integrating symbolic execution with model checking for Java bytecode analysis. Autom. Softw. Eng. 20(3), 391–425 (2013)CrossRef
59.
Zurück zum Zitat Rocha, H.O., Barreto, R.S., Cordeiro, L.C.: Hunting memory bugs in C programs with Map2Check (competition contribution). In: Proc. TACAS, LNCS, vol. 9636, pp. 934–937. Springer, Heidelberg (2016) Rocha, H.O., Barreto, R.S., Cordeiro, L.C.: Hunting memory bugs in C programs with Map2Check (competition contribution). In: Proc. TACAS, LNCS, vol. 9636, pp. 934–937. Springer, Heidelberg (2016)
60.
62.
Zurück zum Zitat Štill, V., Ročkai, P., Barnat, J.: DIVINE: Explicit-state LTL model checker (competition contribution). In: Proc. TACAS, LNCS, vol. 9636, pp. 920–922. Springer, Heidelberg (2016) Štill, V., Ročkai, P., Barnat, J.: DIVINE: Explicit-state LTL model checker (competition contribution). In: Proc. TACAS, LNCS, vol. 9636, pp. 920–922. Springer, Heidelberg (2016)
63.
Zurück zum Zitat Visser, W., Havelund, K., Brat, G., Park, S., Lerda, F.: Model checking programs. Autom. Softw. Eng. 10(2), 203–232 (2003)CrossRef Visser, W., Havelund, K., Brat, G., Park, S., Lerda, F.: Model checking programs. Autom. Softw. Eng. 10(2), 203–232 (2003)CrossRef
65.
Zurück zum Zitat Yin, L., Dong, W., Liu, W., Li, Y., Wang, J.: YOGAR-CBMC: CBMC with scheduling constraint based abstraction refinement (competition contribution). In: Proc. TACAS, LNCS, vol. 10806, pp. 422–426. Springer, Cham (2018) Yin, L., Dong, W., Liu, W., Li, Y., Wang, J.: YOGAR-CBMC: CBMC with scheduling constraint based abstraction refinement (competition contribution). In: Proc. TACAS, LNCS, vol. 10806, pp. 422–426. Springer, Cham (2018)
67.
Zurück zum Zitat Yin, L., Dong, W., Liu, W., Wang, J.: Parallel refinement for multi-threaded program verification. In: Proc. ICSE. IEEE (2019) Yin, L., Dong, W., Liu, W., Wang, J.: Parallel refinement for multi-threaded program verification. In: Proc. ICSE. IEEE (2019)
Metadaten
Titel
Automatic Verification of C and Java Programs: SV-COMP 2019
verfasst von
Dirk Beyer
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-030-17502-3_9