Dieses Kapitel untersucht die Integration großer Sprachmodelle (LLMs) mit direktem Kernelverschwimmen, um die Erkennung von Software-Schwachstellen in Betriebssystemen zu verbessern. Die Autoren präsentieren ein neuartiges Framework, SyzAgent, das die Fähigkeiten des bestehenden Syzkaller-Tools erweitert, indem es Echtzeit-Feedback von LLMs einbezieht. Dieser dynamische Ansatz ermöglicht es dem verschwommenen Prozess, sich an Veränderungen im Kernel anzupassen, wodurch Effizienz und Abdeckung verbessert werden. Das Kapitel vertieft sich in die Architektur von SyzAgent und beschreibt seine Komponenten und Interaktionen mit Syzkaller. Es liefert auch vorläufige experimentelle Ergebnisse, die die Wirksamkeit der LLM-gestützten Methode beim Brechen von Deckungsplateaus und der Überwindung traditioneller Fuzzing-Tools zeigen. Die Autoren diskutieren die Herausforderungen und Erkenntnisse aus der Integration von LLMs mit Kernel-Fuzzern und heben das Potenzial dieses Ansatzes für zukünftige Fortschritte in der Softwaresicherheit hervor. Das Kapitel schließt mit einer Diskussion über die zukünftigen Richtungen und Verbesserungen von SyzAgent, wobei die Notwendigkeit weiterer Forschung betont wird, um die Fähigkeiten von LLMs beim Kernel-Fuzzing vollständig zu nutzen.
KI-Generiert
Diese Zusammenfassung des Fachinhalts wurde mit Hilfe von KI generiert.
Abstract
Direct kernel fuzzing is a targeted approach that focuses on specific areas of the kernel, effectively addressing the challenges of frequent updates and the inherent complexity of operating systems, which are critical infrastructure. This paper introduces SyzAgent, a framework integrating LLMs with the state-of-the-art kernel fuzzer Syzkaller, where the LLMs are used to guide the mutation and generation of test cases in real-time. We present preliminary results demonstrating that this method is effective on around 67% cases in our benchmark during the experiment.
1 Introduction
Operating systems (OS) are crucial in modern computing infrastructures, making the correctness and reliability of an OS kernel vital. Fuzzing is a common method for identifying software vulnerabilities and has been notably applied in kernel testing with tools like Syzkaller [3], which has identified many bugs. Despite progress, the complexity of modern OSes can impede fuzzers from reaching deeper code paths. To improve fuzzing efficiency and coverage, researchers have explored ways to better discover and utilize the dependency relations between system calls and tasks [8, 9]. Other works have employed reinforcement learning techniques [11] and static analysis methods [6, 14] to target previously unreached code during fuzzing.
With the rapid advancement of generative AI [5], the use of large language models (LLMs) in system fuzzing is increasingly recognized [12]. The KernelGPT method [13] has been proposed to utilizing LLMs to generate Syzlang, a domain-specific language for system calls, facilitating improved seed generation and test case creation in Syzkaller [4].
Anzeige
Instead of general kernel fuzzing like Syzkaller, this work emphasizes direct kernel fuzzing, which targets specific, often critical areas within the OS kernel to manage the challenges posed by frequent updates and rapid iterations. The Syzdirect approach [10] extends Syzkaller by leveraging the call graph and resource model to provide structured guidance for generating test cases more effectively, enabling the more effective direct kernel fuzzing.
In this work, we integrate LLMs with direct fuzzing of the OS kernel. The source code and fuzzing intermediate results are fed to the LLM dynamically to retrieve guidance for test case generation. Unlike KernelGPT, which focuses on generating Syzlang specifications, and Syzdirect, which utilizes pre-built guidance from the call graph and resource model, our approach employs real-time feedback from the LLM to adapt to changes in the kernel. We implemented our framework, SyzAgent, to achieve this integration and provide preliminary experimental results demonstrating the effectiveness of the approach. Without loss of generality, GPT-4o [1] is used for the experiments in this paper. In addition, we share insights into the challenges and experiences encountered while integrating LLMs with kernel fuzzers.
2 Motivating Example
Consider a commit changing the function __anon_inode_getfd in the Linux kernel, referred to as the target function. Our objective is to test the newly introduced code in this commit using guidance from a LLM.
By compiling and analyzing the Linux kernel, we generate a set of call paths, which represent the routes from system calls to the target function in the kernel’s call graph. Below are two example call paths, where func1\(\rightarrow \)func2 indicates that func2 is called within the body of func1:
The first path illustrates a direct call to the target function from a system call (inotify_init), while the second involves an indirect call through other functions within two steps. We collect these paths to inform the LLM about potential triggers for the target function. Once identified, the source code from these paths, referred to as calling code, will be extracted and used to formulate the initial prompt, as shown in Prompt 1.11.
Upon receiving the initial prompt, the LLM identifies the following system calls that may potentially interact with the target function.
These system calls represent possible entry points to the target function, and the reason to use the LLM for analysis is to leverage its potential to identify additional system calls, as LLM may provide more diverse outcome since it has been trained on extensive open-source project data.
Subsequently, a kernel fuzzer like Syzkaller is launched with generated test cases using system calls with increased probability to reaching the target. During the fuzzing process, whenever 500 test cases are executed, those that covering functions within 2 steps of the target function are collected, and the covered source code is recorded to create a feedback prompt, as shown in Prompt 1.2.
After receiving the feedback prompt, the LLM provides an updated list of system calls. With real test cases available, the LLM is more likely to introduce related system calls. In this example, the LLM adds two more system calls, drm_syncobj_handle_to_fd_ioctl and mmap, to the initial list. This improvement in system call generation allows the fuzzing process to cover the target function more frequently in subsequent runs.
3 Approach
The architecture of our proposed approach is depicted in Figure 1, comprising two parts: 1) the original kernel fuzzer Syzkaller, and 2) its LLM extension, SyzAgent. Below, we introduce each part and explain their interactions.
Fig. 1.
SyzAgent extends the existing Syzkaller by applying LLM in fuzzing kernels.
3.1 Syzkaller
Syzkaller fuzzes the OS kernel by executing finite sequences of system calls with their arguments, where system call comes from a set of system calls S. It creates three task types in the work queue (as shown in Figure 1):
Generation Initial seed programs are generated from manually tuned templates to ensure deeper test cases.
Mutation Mutation is applied to programs selected from a corpus (i.e., previously executed programs with new coverage). During this phase, system calls and their arguments are modified, including adding, removing, or changing system calls. This process is guided by the fuzzing state, which includes a choice table (ct): a two-dimensional array where \(ct[c][c']\) represents the probability of generating system call c\('\) after c. System call insertion is either random (5% of the time) or based on probabilities from the choice table. Arguments are generated by considering available resources at the insertion point.
Triage Test cases that triggered new coverage will be verified and minimized by removing redundant system calls, with successful cases added to the corpus as new seeds. Triage tasks take priority in the fuzzing process, followed by generation and mutation if no triage tasks are available.
3.2 SyzAgent
We propose SyzAgent to extend Syzkaller, as shown in Figure 1. It integrates an LLM into Syzkaller’s generation and modification of the choice table. The LLM influences the fuzzing process in three key procedures: 1. It constructs the initial choice table based on static analysis and LLM analysis results. 2. During fuzzing, it collects some running test cases with coverage information during fuzzing, formulates feedback prompt to obtain guidance on fuzzing from LLM. 3. Finally it updates the choice table using guiding information provided by the LLM. The extension corresponds to the four new components: the preprocessor, static analyzer, address extractor, and LLM interface, as shown in Figure 1.
Pre-Processor The pre-processor compiles the OS kernel’s source code into a binary image for testing and generates intermediate representations (IR) from LLVM framework[7]. These generated IR files are used for static analysis and are avoided from any optimization to reflect the calling relation as detailed as possible. Additionally, the pre-processor gathers information on all C functions present in the Linux kernel.
Static Analyzer The static analyzer parses and analyzes the IR files generated by the pre-processor, resulting in the call graph of the OS kernel. A call graph of the Linux kernel is a graph \( G = (C \cup F, E) \), where: \( C \) is a finite set of system calls, \( F \) is a finite set of other functions, and \( E \subseteq (C \cup F) \times (C \cup F) \) represents the set of directed edges in the call graph, showing the calling relationships between functions. Given a target function \(f_t\), the static analyzer performs following tasks:
Job 1: Find all paths from some \( s \in C \) to \( f_t \). This corresponds to the first type of call paths in the motivating example.
Job 2: Find all paths from any function \( f \) to \( f_t \) with length \( l \), where \( l < \textbf{k} \) and \( \textbf{k} \in \mathbb {N} \) is a constant. These are the second type of call paths in the
motivating example;
Job 3: Identify all close functions\( f_c \) within a specific close range constant \( \textbf{d} \in \mathbb {N} \), where \( f_c \) is an \( n \)-step predecessor in the call graph and \( n \le \textbf{d} \). A predecessor refers to a function that directly calls or influences another function in a call path. These functions are denoted by close area.
Address Extractor The address extractor matches program counter (PC) points in the compiled Linux kernel binary to their actual locations. Syzkaller uses KCOV [2] for coverage feedback, tracking the PC points reached by test cases. To improve efficiency, PC points in the close area are extracted in advance for quicker coverage checks.
LLM-Interface The LLM interface communicates with the LLM by sending the initial and feedback prompts. It then extracts a set of system calls, \( S_{inc} \), from the LLM feedback to update the choice table.
The choice table is modified as follows: for any system calls \( c_1 \) and \( c_2 \), let \( ct_0[c_1][c_2] \) represent the original choice table value in Syzkaller. The LLM-updated value, \( ct_1[c_1][c_2] \), is set to \( ct_0[c_1][c_2] + 1 \) if either \( c_1 \in S_{inc} \) or \( c_2 \in S_{inc} \); otherwise, \( ct_1[c_1][c_2] = ct_0[c_1][c_2] \). The final choice table is computed by normalizing \( ct_1 \) for each row: \( ct[c_1][c_2] = \dfrac{ct_1[c_1][c_2]}{\sum _i ct_1[c_1][c_i]}. \)
Apart from components above, it is worth mentioning that since LLM analysis runs slower than fuzzing. We sample test cases from all test cases and run LLM analysis in parallel. For every 500 cases sampled, some cases that covered close area will be selected randomly to do the feedback prompting via LLM-interface.
4 Preliminary Experimental Results
We conducted experiments on fuzzing the Linux kernel to demonstrate that our LLM-driven SyzAgent method: 1) effectively adapts the existing vanilla Syzkaller tool, even breaking its coverage plateau, and 2) offers advantages over the specialized direct kernel fuzzing tool, SyzDirect.
Our experimental setup consisted of a PC equipped with a 13th Gen Intel Core i7-13700 processor and 128GB of memory. The virtual machine under test was configured on QEMU, running a Linux system on an AMD architecture with 4 CPUs and 4GB of memory. Given the 12.8k token limit of the LLM-interface in GPT-4o [1], we selected target functions based on the principle that no function in their call paths should have more than five predecessors to prevent the explosion of number of calling paths of the target function. From this set, we selected a total of 27 target functions which our tool can process currently as our benchmark.
SyzAgentvs Syzkaller In this experiment, each target function was fuzzed using both SyzAgent and Syzkaller, with each tool tested three times per function, and each run limited to two hours. The fuzzing results from SyzAgent and Syzkaller are summarized in Table 1. Out of the 27 cases, SyzAgent achieved a hit rate (the ratio of number of test cases hit close area and the number of all test cases) that surpassed Syzkaller by more than \(10\%\) in 8 cases, while it underperformed compared to Syzkaller in only 5 cases. Comparably, we also compute how the increased coverage outperforms the original one, represented as \(\omega = \frac{\text {Avg. Diff}}{\text {Avg. Syzkaller Hit Rate}}\) and 18 cases out of 27 have \(\omega \ge 10\%\) which is 67%.
Table 1.
Experimental Data Comparison between Two Methods (“Dist.” denotes the minimum length of call path from some system call to target function. “Hit %” represents the ratio of the test cases that covered close area in the sampled test cases in percentage. “Avg. Diff” denotes the average difference of the hit rate of SyzAgent minus the hit rate of Syzkaller across all runs.)
ID
Target Function
Dist.
SyzAgent Hit %
Syzkaller Hit %
Avg. Diff
Run 1
Run 2
Run 3
Run 1
Run 2
Run 3
1
ksys_semctl
1
28.27
28.89
31.8
3.8
5.15
1.11
26.3
2
__sys_setfsgid
1
19.1
13.89
9.45
0.0
0.03
0.0
14.14
3
do_sched_yield
1
25.15
26.32
41.7
20.57
30.14
26.86
5.2
4
vm_acct_memory
2
32.4
28.82
32.84
22.12
17.83
15.27
12.95
5
__shmem_file_setup
2
8.82
7.32
7.81
3.79
6.49
4.93
2.91
6
io_register_iowq_m...
2
22.05
15.63
18.26
1.02
3.28
2.59
16.35
7
__anon_inode_getfile
2
30.47
30.38
30.65
9.97
11.89
11.68
19.32
8
copy_fsxattr_from...
3
56.8
56.99
54.75
51.0
49.34
50.1
6.03
9
__io_uring_add_...
3
36.02
34.97
28.0
8.9
2.65
11.76
25.23
10
keyring_ptr_to_key
3
30.26
21.64
23.95
6.58
2.48
6.47
20.11
11
mnt_get_writers
3
77.1
73.33
75.44
67.6
73.84
80.1
1.44
12
futex_requeue_pi_...
3
0.92
0.0
0.0
2.94
0.31
2.0
-1.44
13
wait_for_device_probe
4
0.33
0.31
0.12
0.35
0.14
0.13
0.05
14
memcpy_to_page
4
24.07
29.73
34.0
7.1
9.02
0.0
23.89
15
kimage_is_dest...
5
1.74
7.69
8.05
0.0
0.06
0.66
5.59
16
find_lock_entries
5
40.68
37.72
33.88
38.28
34.67
39.23
0.03
17
fsnotify_data_sb
5
58.2
57.33
61.22
55.73
55.72
60.94
1.45
18
security_inode_set...
5
12.02
10.74
12.86
3.39
4.78
3.61
7.94
19
free_partitions
6
13.48
21.94
14.57
28.17
24.73
25.97
-9.63
20
bpf_prog_free
6
0.56
5.12
3.68
1.25
1.5
3.37
1.08
21
locks_delete_glob...
6
0.59
0.58
0.0
0.73
0.04
0.56
-0.05
22
pmd_none_or_clear_bad
7
12.92
11.3
16.47
14.72
19.68
18.11
-3.94
23
__submit_bio_noac...
7
31.89
21.5
19.88
20.09
28.69
27.06
-0.86
24
srcu_read_lock_nm...
7
19.89
45.15
26.51
23.62
25.22
20.31
7.47
25
trace_wbc_writepage
8
1.82
0.81
3.03
0.79
1.71
0.6
0.85
26
sk_set_bit
8
8.21
10.61
6.48
3.09
3.23
6.61
4.12
27
sidtab_search_core
8
76.48
77.85
75.68
73.14
76.26
73.34
2.42
These results confirm that the LLM integration in SyzAgent effectively improves Syzkaller’s performance in direct fuzzing, as the majority of cases achieved a higher hit rate when using SyzAgent.
Fig. 2.
Coverage-Execution graph for target function sk_set_bit within 2h(
line for Syzkaller and
line for SyzAgent)
While this paper primarily focuses on kernel direct fuzzing, during our experiments, we observed that SyzAgent successfully breaks the Syzkaller coverage plateau. In the 27 direct fuzzing cases, we found that 5 cases achieved higher coverage within a fixed number of test cases, with IDs 4, 19, 25, 26, 27. Figure 2 illustrates the coverage progression for case 27, where deeper target functions were tested, partially validating our hypothesis that the main reason of plateau is the fuzzer lacking a seed that can reach deeper code. However, we also noted that in 6 cases, the coverage performance of SyzAgent was inferior to that of Syzkaller, while the remaining cases showed similar performance between the two tools. We identify this as another promising new direction emerging from this work, and it will be valuable to investigate this hypothesis further, exploring how to harness the LLM’s capabilities to systematically improve kernel fuzzing coverage.
Table 2.
Exemplar system call entry analysis results from SyzAgent and SyzDirect reveal that SyzAgent has advantages over SyzDirect in identifying system call relationships, as highlighted in
. Conversely, SyzDirect excels in detecting argument types, as shown in
, a feature not currently supported by SyzAgent.
Dummy
SyzAgentvs SyzDirect An end-to-end comparison with the SyzDirect tool was not feasible due to multiple issues encountered during its installation, configuration, and manual instrumentation requirements.
Nevertheless, we managed to run SyzDirect’s stages for system call entry analysis and conducted a comparison with the LLM-generated results from SyzAgent. Table 2 presents the results for three target functions in the Linux kernel2. In the table, system calls highlighted in
indicate cases where SyzAgent outperforms SyzDirect, while those in
represent cases where SyzDirect performs better.
In cases with IDs 2 and 3, SyzAgent identified three additional system call entries compared to SyzDirect. After manually verifying these cases, we found that SyzDirect’s call graph analysis was less precise than that of SyzAgent. For example, in the first case, io_uring_enter did not appear to be beneficial for reaching the target function. However, SyzDirect outperformed SyzAgent in providing specific variants of system calls, likely due to its more detailed call graph model that incorporates resource-producing and consuming relationships, which are currently not included in SyzAgent analysis. This results in a finer-grained analysis by SyzDirect compared to that of SyzAgent.
5 Conclusion and Discussion
In this work, we explored the integration of LLM capabilities with OS kernel fuzzers in real-time. Based on our preliminary experimental results, this approach appears effective for direct fuzzing and warrants further investigation. However, our work is still in its early stage, as several advanced techniques, such as the relational graph approach from [9] and more sophisticated static analysis methods like those in [6, 14], have not yet been incorporated. Our work also lacks the validation on whether the system calls are correctly generated.
At the implementation level, there are several ways SyzAgent could be enhanced: 1) Splitting the calling code into smaller segments to facilitate deeper exploration of target functions ; 2) Integrating more closely with Syzkaller to enable LLMs to contribute to argument mutation processes; and 3) Using the distance to the target function of cases that cover nearby areas to select the most promising test cases for generating feedback prompts.
We regard LLMs as a viable solution to the complexities inherent in OS kernel fuzzing, thanks to the vast amount of data on which they are trained and optimized. The combination of LLM capabilities with our real-time feedback framework offers a flexible way to automatically adjust the fuzzing strategy. In the future, we believe it will be important to continue researching how LLMs can boost fuzzing coverage by utilizing information from intermediate results of static analysis and kernel documentation.
Acknowledgments
We gratefully thank Pierre Olivier for providing insights of linux kernel on this study. This work is partly supported by CAS Project for Young Scientists in Basic Research, Grant No.YSBR-040, ISCAS New Cultivation Project ISCAS-PYFX-202201, ISCAS Basic Research ISCAS-JCZD-202302 and the Ministry of Education, Singapore under its Academic Research Fund Tier 3 (Award ID: MOET32020-0004).
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.