Zum Inhalt
Erschienen in:

Open Access 2025 | OriginalPaper | Buchkapitel

Fizzer with Local Space Fuzzing

(Competition Contribution)

verfasst von : Martin Jonáš, Jan Strejček, Marek Trtík

Erschienen in: Fundamental Approaches to Software Engineering

Verlag: Springer Nature Switzerland

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In diesem Kapitel wird ein ausgeklügelter Ansatz zur Testgenerierung vorgestellt, der dynamische Analysen nutzt, um eine hohe Branchenabdeckung zu erreichen. Das Herzstück dieser Methode ist Fizzer, ein Graukasten-Fuzzer, der Programme instrumentiert, um detaillierte Ausführungsinformationen zu sammeln, wobei der Schwerpunkt auf der Auswertung atomarer Boolescher Ausdrücke (abes) liegt. Im Gegensatz zu herkömmlichen Graustufen-Fuzzern sammelt Fizzer umfangreiche Daten, um zielgerichtetere Eingaben zu erstellen, die darauf abzielen, jede einzelne Abe in verschiedenen Aufrufkontexten auszuwerten. Die ursprüngliche Version von Fizzer, die eine Bronzemedaille in der Kategorie Cover-Branches von Test-Comp 2024 gewann, wurde um eine dynamische Tint-Flow-Analyse und lokale Suchanalysen erweitert, um ihre Grenzen zu adressieren. Diese Verbesserungen mildern Probleme wie langsame Sensitivitätsanalysen und Abweichungen beim Ausführungspfad ab, was zu einer besseren Branchenabdeckung führt. Das Kapitel behandelt auch die Softwarearchitektur von Fizzer, ihre Implementierung in C + + und ihre Abhängigkeit von der LLVM-Infrastruktur. Es vergleicht die Abdeckungsleistungen der ursprünglichen und der neuen Versionen von Fizzer auf Cover-Branches-Benchmarks und hebt signifikante Verbesserungen hervor. Die experimentellen Ergebnisse, die in einem Streudiagramm präsentiert werden, zeigen, dass die neue Version von Fizzer im Allgemeinen eine bessere Abdeckung der Branche erreicht, manchmal in einer Größenordnung. Darüber hinaus bietet das Kapitel Einblicke in den Aufbau, die Konfiguration und die Beiträge des Entwicklungsteams, was es zu einer umfassenden Ressource für das Verständnis fortgeschrittener Fuzzing-Techniken und ihrer praktischen Anwendung macht.
Hinweise
This work has been supported by the Czech Science Foundation grant GA23-06506S.
M. Trtík—Jury member.

1 Test-Generation Approach

Fuzzers [8] are tools that generate test inputs for a given program with the use of dynamic analysis. Gray-box fuzzers first instrument the given program to get some information about each program execution (e.g., which basic blocks were visited during the run). The instrumented program is then executed on some input and the obtained information about the execution is used to prepare the input for the next execution with the aim to cover some previously uncovered code. This process is repeated until some goal or limit is reached. Fizzer focuses solely on achieving high branch coverage and applies the same process also in the Cover-Error category. Hence, only branch coverage is discussed in this paper.
While standard gray-box fuzzers prefer fast executions and thus gather only a little information, Fizzer collects more information and aims to create more targeted inputs. More precisely, Fizzer tracks the evaluation of atomic Boolean expressions (abe) in the given program, which are the Boolean expressions built from expressions of other types, e.g., (x > 21) or (string[i] == ’B’). Fizzer instruments the program such that each time an abe is evaluated, the program stores the current calling context (i.e., the sequence of function calls that are on the call stack), the value true or false of the abe, and the distance to the opposite value. For example, if the abe (x > 21) is evaluated to true, the distance to false is computed as x - 21. Fizzer  aims to generate tests that evaluate each abe in each reached calling context to both true and false.
Assume that some \( input \) leads to the evaluation of an abe in some calling context to true. The original version of Fizzer  [6] applies the following steps to evaluate it to false. First, it runs the sensitivity analysis to detect the \( input \) bytes that affect the distance (and thus probably also the value) of the considered abe in the considered calling context. For each bit of \( input \), sensitivity analysis executes the program on \( input \) with the bit flipped. If the distance changes, the whole byte containing the bit is marked as sensitive. As the second step, Fizzer runs the byteshare analysis if it has seen some \( input '\) that evaluates the same abe to false in a different calling context. The analysis replaces sensitive bytes in \( input \) by the corresponding sensitive bytes of \( input '\) and executes the program. If this still does not evaluate the abe in the considered calling context to false, Fizzer applies the last step. It performs a gradient descent on the sensitive bytes with the aim to minimize the absolute value of the distance and thus evaluate the abe to false. We refer to the full paper [7] for more details.
The original version of Fizzer received the bronze medal in Cover-Branches category of Test-Comp 2024. Still, we have identified some drawbacks. One of them is that the sensitivity analysis is very slow on programs with a large input, because it may require a program execution for each bit of the input. Moreover, it does not detect bytes that affect the value of abe  if flipping more than one bit is needed to change the distance. In the new version of Fizzer, sensitive bytes are computed by a dynamic taint-flow analysis. The program is executed on \( input \) and bytes returned from each call to __VERIFIER_nondet_*() are tainted by a fresh taint. All the taints are propagated through instructions, from their input to output arguments. Input bytes whose taint reaches the expression of the abe are marked as sensitive. While the original sensitivity analysis under-approximated the sensitive bytes, the new one over-approximates them.
The original approach also struggles with divergencies: a small modification of the input changes the execution path such that the desired abe in the desired calling context is missed. On the positive side, these divergences can cover some program parts not covered so far. On the negative side, the divergencies disrupt the original sensitivity analysis and gradient descent. To prevent them, we developed the local search analysis that internally applies several strategies for input generation including gradient descent. An important feature of the local search analysis is that it runs all its strategies in a local space of the target abe. We sketch this idea using an execution path with only two abe s x = y and x = 1, where x, y are 32-bit signed integers. Assume that the path was explored using the input where \(\texttt {x}=\texttt {y}=0\) and that we want to evaluate the second abe to true. The distance functions of the abe s are \(f(\texttt {x},\texttt {y})=\texttt {x}-\texttt {y}\) and \(g(\texttt {x})=\texttt {x} - 1\), respectively. Observe that the second distance depends only on x. Mutating \(\texttt {x}\) alone would produce an input for which the execution diverges from the original path on the first abe. In order to prevent this, we build a stack of local spaces, one for each abe along the path. Intuitively, the local space captures all values of sensitive bytes that keep the distance of the corresponding abe unchanged. For example, the distance for the first abe x = y given \(\texttt {x}=\texttt {y}=0\) is \(f(0,0)=0\) and thus the local space is given by equality \(f(\texttt {x},\texttt {y})=f(0,0)\), i.e., \(\texttt {x}-\texttt {y}=0\). Now assume that gradient descent applied on the distance of the second abe wants to execute the program with \(\texttt {x} = 1\). The original Fizzer would directly run the program on \(\texttt {x}=1\) and \(\texttt {y}=0\) and the execution would diverge on the first abe. The local search analysis uses the local space of the first abe to figure out that in order not to diverge from the path, y should be set to 1 and runs the program on \(\texttt {x}=\texttt {y}=1\) and the execution will not diverge. Due to space limitations we provide more details about the approach in [5].

2 Software Architecture

Fizzer is implemented in C++, significantly depends on the LLVM infrastructure, and is divided into two executable parts: Instrumenter and Server. The task of Instrumenter is to instrument a given program with the code tracking and reporting the information about program executions. Server schedules and runs the analyses described in the previous section, in particular for generating new inputs and starting executions of the instrumented program. The instrumented program runs in a separate process and communicates with the Server using shared memory, so if the program crashes, Server can still get the information about the run and continue with test generation.
The current version of Fizzer additionally compiles the given program to our new custom Simple Assembly LAnguage (SALA), which is basically a simplified version of LLVM, but SALA instructions do not contain type information, there are no LLVM registers nor intrinsics, and functions are not required to be in SSA form. Server contains a SALA interpreter that performs the taint-flow analysis described in the previous section.
Fig. 1:
Comparison of coverages achieved by the original [6] and new Fizzer  on Cover-Branches benchmarks of Test-Comp 2024.

3 Strengths and Weaknesses

The current Fizzer has mostly the same strengths and weaknesses as the original version [6]. Fizzer is still a relatively simple and very compact tool with minimal external dependencies. It can be applied to programs of arbitrary size and programs that use external functions available only in compiled form.
The main weaknesses of Fizzer stem from the fact that it is a fuzzer that significantly relies on gradient descent. First, being a fuzzer, it generates tests by executing the program. These executions must have some resource limits specified (e.g., number of evaluated abes, input size, size of calling context, time limit). Fizzer  thus explores only prefixes of program paths and consequently tends to focus on parts of the program close to the entry point. This weakness is partially mitigated by running an optimizer after the fuzzing finishes. Optimizer extends the limits and re-runs the program executions that exceeded the standard limits. Second, being reliant on gradient descent, it tends to perform poorly if the branching conditions are non-linear as the standard gradient descent only computes information about linear approximations of the objective function.
Some of the weaknesses of the original Fizzer mentioned in Section 1, i.e., expensive sensitivity analysis and divergences caused by small modifications of inputs, are partially mitigated by the new taint-flow analysis and search in local spaces. This is supported by our experimental evaluation on all Cover-Branches benchmarks from Test-Comp 2024 [1] (our experiments use less resources than the Test-Comp 2024 setting, in particular the time limit was set to 300 s per benchmark). The results presented by the scatter plot in Figure 1 show that the new version of Fizzer generally achieves better branch coverage, sometimes even by an order of magnitude. On average, the new version of Fizzer achieved the coverage 66.5% while the original version achieved 59.8%. In Test-Comp 2025 [3], the new version of Fizzer finished in the 4th place in Cover-Branches category and won a bronze medal in Overall [2].

4 Tool Setup and Configuration

Fizzer can be downloaded either as a binary or as a source code (links are in Section 6). For the source code of the version used in the competition, check out the tag TESTCOMP25. The README.md file in the root of the repository contains instructions for building the tool. The tool is used via sbt-fizzer.py script:
All results including the generated tests will be stored under the directory <output-dir>. The list of all available options can be obtained by the command sbt-fizzer.py --help. Options used in the competition are:
  • max_seconds 865    The timeout for the entire fuzzing process.
  • optimizer_max_seconds 30    The timeout for the optimizer.
  • max_exec_milliseconds 500    The timeout for each program execution.
  • max_exec_megabytes 13312   The memory limit for each program execution.
  • max_stdin_bytes 65536    The upper bound for the number of input bytes.
  • stdin_model stdin_replay_bytes_then_repeat_zero   An input model: given input bytes followed by bytes of the value 0.
  • test_type testcomp    The format for the generated tests.

5 Software Project and Contributors

Fizzer has been developed at the Faculty of Informatics of Masaryk University by Marek Trtík and Lukáš Urban (contributed to the original version). Martin Jonáš and Jan Strejček participated in discussions and contributed to the project by some ideas. The tool is open-source and available under the zlib license.

6 Data-Availability statement

Fizzer is available in a binary form at Zenodo [4] and the source code is available at GitHub:
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://​creativecommons.​org/​licenses/​by/​4.​0/​), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Literatur
3.
Zurück zum Zitat Beyer, D.: Advances in automatic software testing: Test-Comp 2025. In: Proc. FASE. Springer (2025) Beyer, D.: Advances in automatic software testing: Test-Comp 2025. In: Proc. FASE. Springer (2025)
6.
Zurück zum Zitat Jonáš, M., Strejček, J., Trtík, M., Urban, L.: Fizzer: New gray-box fuzzer (competition contribution). In: Beyer, D., Cavalcanti, A. (eds.) Fundamental Approaches to Software Engineering - 27th International Conference, FASE 2024, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2024, Luxembourg City, Luxembourg, April 6-11, 2024, Proceedings. Lecture Notes in Computer Science, vol. 14573, pp. 309–313. Springer (2024), https://doi.org/10.1007/978-3-031-57259-3_17 Jonáš, M., Strejček, J., Trtík, M., Urban, L.: Fizzer: New gray-box fuzzer (competition contribution). In: Beyer, D., Cavalcanti, A. (eds.) Fundamental Approaches to Software Engineering - 27th International Conference, FASE 2024, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2024, Luxembourg City, Luxembourg, April 6-11, 2024, Proceedings. Lecture Notes in Computer Science, vol. 14573, pp. 309–313. Springer (2024), https://​doi.​org/​10.​1007/​978-3-031-57259-3_​17
7.
Zurück zum Zitat Jonáš, M., Strejček, J., Trtík, M., Urban, L.: Gray-box fuzzing via gradient descent and Boolean expression coverage. In: Finkbeiner, B., Kovács, L. (eds.) Tools and Algorithms for the Construction and Analysis of Systems - 30th International Conference, TACAS 2024, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2024, Luxembourg City, Luxembourg, April 6-11, 2024, Proceedings, Part III. Lecture Notes in Computer Science, vol. 14572, pp. 90–109. Springer (2024), https://doi.org/10.1007/978-3-031-57256-2_5 Jonáš, M., Strejček, J., Trtík, M., Urban, L.: Gray-box fuzzing via gradient descent and Boolean expression coverage. In: Finkbeiner, B., Kovács, L. (eds.) Tools and Algorithms for the Construction and Analysis of Systems - 30th International Conference, TACAS 2024, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2024, Luxembourg City, Luxembourg, April 6-11, 2024, Proceedings, Part III. Lecture Notes in Computer Science, vol. 14572, pp. 90–109. Springer (2024), https://​doi.​org/​10.​1007/​978-3-031-57256-2_​5
Metadaten
Titel
Fizzer with Local Space Fuzzing
verfasst von
Martin Jonáš
Jan Strejček
Marek Trtík
Copyright-Jahr
2025
DOI
https://doi.org/10.1007/978-3-031-90900-9_14