Dieses Kapitel untersucht die Überprüfung der Äquivalenz eines Rust-Ports der musl C Mathematik-Bibliothek, wobei der Schwerpunkt auf den Herausforderungen und Lösungen bei der Überprüfung der Korrektheit des portierten Codes liegt. Die Arbeit nutzt den SMACK-Softwareprüfer, der sowohl C als auch Rust unterstützt, um sicherzustellen, dass das Verhalten der ursprünglichen C-Bibliothek mit der Rust-Implementierung übereinstimmt. Das Kapitel vertieft sich in die Feinheiten der Fließkomma-Arithmetik, die durch Rundungen und spezielle Werte wie NaN zusätzliche Komplexitäten mit sich bringt. Es untersucht den Toolflow von SMACK und beschreibt, wie es den LLVM-IR-Code zur Verifikation mit Backend-Solvern wie Z3 und CVC5 in eine Boogie Intermediate Verifikationssprache übersetzt. In diesem Kapitel werden auch verschiedene Techniken zur Verbesserung der Skalierbarkeit von Verifikationen diskutiert, darunter eine übermäßige Annäherung von Theorien, die Einschränkung von Operationen zur Vermeidung von NaN-Werten und eine gesteuerte Äquivalenzprüfung. Praktische Experimente werden an einer Reihe von Benchmarks durchgeführt, die die Leistungsfähigkeit verschiedener Prüfer und die Entdeckung von Fehlern sowohl in der Rust-Bibliothek als auch im SMACK-Prüfer offenbaren. Das Kapitel schließt mit einer Diskussion über verwandte Arbeiten und das Potenzial für zukünftige Anwendungen der entwickelten Techniken.
KI-Generiert
Diese Zusammenfassung des Fachinhalts wurde mit Hilfe von KI generiert.
Abstract
Recent advances in satisfiability modulo theories have brought practical software verification within reach. The advent of the LLVM project presents a common representation which allows verification between programs written in different languages such as C and Rust. New programming languages such as Rust often have less complete standard libraries. In this work, we explore using the SMACK software verifier to check the equivalence of math library routines from the musl libm library and their translations into native Rust programs.
Hinweise
Z. Rakamarić—The author collaborated on this paper outside of his work at Amazon.
M. S. Baranowski, Z. Rakamarić and G. Gopalakrishnan—Supported in part by NSF Awards CCF 2217154, 2346394, and 2403379.
1 Introduction
Rust is a modern programming language focused on safety and native performance. Due to Rust’s novelty, it has been reliant on libraries written in other programming languages, often C, to provide many core functions. However, languages such as C do not have the same safety guarantees provided by Rust and are thus unsafe to use inside Rust. Many libraries provide safe Rust bindings to such C language libraries. But this requires the library developer to completely contain any potential unsafety within their library. Additionally, the foreign code used in this process may not be as optimizable as native Rust code, such as allowing inlining, and may inhibit debugging of such programs from Rust.
An alternative option is to instead port the library code from the source language to Rust. This has the upside of making the library native for Rust programs, and limits or eliminates unsafe code used inside the library. Nonetheless, such ports may not always be correct because of subtleties related to the source programming language. This can be exacerbated by the usage of floating-point arithmetic which introduces additional subtleties which may not be as well known. Moreover, languages may not have complete support for floating-point arithmetic which may lead to bugs that are difficult to detect within the langauge itself.
Anzeige
Formal equivalence checking can provide a solution here because it can show that the original-language program matches the behavior of the program ported to the target programming language. In this work we apply the SMACK software verifier to the Rust-language port of the musl C-language math library. SMACK is a software verifier that can work with both C and Rust language programs at once. We give our experiences using SMACK to show the equivalence of these programs and our extensions to the SMACK verifier to help show equivalence of more programs.
2 Background
In this section we describe the workings of the SMACK software verifier and give some needed background on floating-point arithmetic.
Fig. 1.
Toolflow of SMACK.
2.1 SMACK
SMACK [18] is a software verification toolchain that translates LLVM IR code into Boogie intermediate verification language [3], which is in turn verified using back-end Boogie verifiers such as the Boogie [3] verifier and the Corral [7] verifier. SMACK has been predominantly used to verify LLVM IR programs produced by the Clang C compiler. SMACK has been extended to support additional programming languages, in particular the Rust programming language [1]. We exploit this ability of SMACK to work with multiple languages together in this work.
Figure 1 shows the toolflow of SMACK, which works as follows:
1.
The SMACK top-level script automates the entire toolflow. It determines which compilers to invoke and flags to use for program compilation. In the case of C programs, it invokes Clang to generate LLVM IR code, while including SMACK’s C language models. The models specify the semantics of common C library functions such as malloc, free, and string operations.
2.
For Rust programs, SMACK invokes the Rust compiler [11] and links in its own Rust models. This includes Rust language support for verification functionality.
3.
The common models file is then linked with the generated LLVM IR files to provide basic verification capabilities. This includes modeling dynamic memory allocation, supporting assertions and assumptions, and generating nondeterministic values.
4.
The core llvm2bpl component takes an LLVM IR file as input, and produces Boogie intermediate verification language (IVL) code that captures the semantics of LLVM IR instructions; it outputs a Boogie file for verification.
5.
Finally, the Boogie or Corral back-end verifiers are invoked on the generated Boogie file. We use Z3 [4] and CVC5 [2] as their SMT solvers.
Anzeige
In this work, we use Corral and Boogie in their bounded verification modes, meaning that loops are unrolled and recursive functions are expanded up to a certain user-provided bound.
2.2 Floating-Point Arithmetic
Floating-point arithmetic is an approximation of real-arithmetic using fixed-sized representations. Because floating-point numbers have a fixed size, operations involving them may involve rounding to keep the resulting number’s size fixed. Rounding means some familiar algebraic identities, such as associativity of both addition and multiplication, do not hold for these operations on floating-point numbers. There is a finite range of values that floating-point numbers can represent, there are positive and negative infinity values to represent numbers outside this range. A final class of floating-point values is the NaN (Not a Number) values which represent the results of invalid computations such as division by 0. NaN values propagate, e.g., adding a NaN with another value produces a NaN, to allow these faulty calculations to be detected when convenient. A notable property of NaN values is that they are not equal to any other floating-point value, including other NaN values. In the SMT theory of floating-points, there is one NaN value which is not comparable to any other value.
3 Cross-Language Equivalence Checking with SMACK
In this work, we explore equivalence checking of functions written originally in the C programming language that are translated into Rust programming language versions of the function, which should be equivalent. Our main focus is on verifying correct translation of a native Rust language math library compared to the source C language version of the math functions. Additionally, because floating-point math can be unintuitive, a math library might have subtle errors related to unintentional replacements in the translation, for example, by applying familiar algebraic simplifications that may not be appropriate to apply to floating point calculations.
Cross-language equivalence checking in SMACK is done by exploiting its ability to handle programs written in multiple programming languages. Because SMACK works mainly on the LLVM and Boogie levels, rather than working directly on source-level languages, SMACK needs to have some source level main program which ties together functions to be verified.
There are some identities that are true, for example \(2.0*x = x+x\) and \(3.0*x = x+x+x\) for 32-bit floating-point numbers. Using the default floating-point rounding-mode, round to nearest ties to even, it is even true that \(4.0*x = x+x+x+x\) and \(5.0*x=x+x+x+x+x\). Given this, we may convince ourselves that \(n*x=x+x+\cdots +x\) where n is an integer and the sum on the right is adding x to itself n times. In the following example, we will show how to set up SMACK to verify that \(5.0*x = x+x+x+x+x\) for 32-bit floating-point numbers. In the Rust program in listing 1.1, we have the function rust_five_x on line 1, which computes \(5.0*x\). Similarly, the C program in listing 1.2 computes \(x+x+x+x+x\) in the function c_five_x on line 1. We now have two functions we want to verify their equality.
Because of SMACK is a software verifier, we need to create a verification harness which ties the two programs together, in this case the function main on line 7 of listing 1.1.1 The first step of the harness function is to create a common input, x (line 8), for both functions we wish to check for equivalence. This input is a nondeterministic value, created through SMACK’s nondet family of functions. This input variable is then constrained (line 9) to not be NaN because this will produce NaN results, which are always unequal. Then, the two functions we wish to check for equality are called with the common input x as their argument, and their return values captured (lines 10 and 11). Finally, the call to verifier_assert (line 12) directs the verifier to check if the two return values are equal.
When we run SMACK on these files, it compiles the two programs to LLVM bytecode, links them into one program, then translates it to Boogie for verification. On this example, SMACK reports no errors, meaning the two functions are equivalent, for inputs that are not NaN.
Now we can use SMACK to check if this pattern holds for \(6.0*x = x+x+x+x+x+x\). We setup the two programs (Listing 1.3 and Listing 1.4) similar to the prior example, but updating the calculations. When we use SMACK on these programs, it reports an error and shows a counter-example.
In this example, SMACK produces the counter-example x=9.22337313636640
4e+18. We can verify that \(6.0*x=5.534024101722168e+19\) while \(x+x+x+x+x+x=5.534023661917517e+19\). We can get SMACK to produce a counter-example that is easier to work with by restricting the range of possible values for x by adding, for example, the constraint \( \texttt {verifier\_assume!(0.5 < x \& \& x < 1.0)}\). With this constraint, we may get \(x=0.5950928330421448\) as a counter example with \(6.0*x=3.570557117462158\) and \(x+x+x+x+x+x=3.570556879043579\). If we sum the numbers, we can see that the final addition rounds in the opposite direction of the multiplication. Note that specific counter-examples generated by SMACK can be influenced by solver version, program structure, variable names, etc, and these may not be the exact values given by SMACK.
3.1 Scalability
In this work, we encountered numerous examples where verification tasks did not finish. While this is a well known issue in software verification, we developed some techniques to help with verification performance on this class of programs.
3.2 Over-approximation of theories
The simplest approach is to first run SMACK without modeling floating-point or bit-vector operations. If SMACK reports no errors, then the functions are equivalent since SMACK was able to show that both functions perform identical operations on their inputs. If SMACK does report an error, we first try enabling the floating point model without enabling the bit-vector theory since it is expensive to transfer between these two theories.
3.3 Restrict operations to not produce NaN
This may not always make verification go through, so additional approaches are needed. One approach is to transform the program so that floating-point arithmetic-operations are constrained to not produce NaN values. We use a Boogie program pass to transform the program so that the results of floating-point addition, subtraction, multiplication and division are constrained to not be NaN values. This helps verification because if the two programs are equivalent, then the solver won’t be able to find a counter-example. NaN values are not equal to any floating-point values – including NaN – so the solver is essentially attempting to prove NaN values are not possible. If the program is prevented from considering NaN values, then the solver can prove that the non-NaN version of the functions are equivalent. When doing this transformation, once the programs are shown to be equivalent absent NaN values, the program can then be shown to not produce NaN values.
3.4 Guided equivalence checking
Finally, we introduce a mechanism where we can try to guide the solver to focus on specific paths in the program to show that corresponding variables in the two programs are equivalent. We do this using new verifier functionality to store values from one program, then use another function to check that another value in the other program are equal. We demonstrate this in listing 1.6 and listing 1.7. Naively, when examining these programs as a pair, the paths through the C-language function are independent of the paths through the Rust-language function. There are 16 potential paths through the combined programs. We use verifier_equiv_store(value) to store the value of a variable on this path to be loaded later. In the Rust-language program, we use verifier_equiv_check(value) to verify the stored value from the C-language program is the same as the value in the Rust-language program. This does significantly reduce solving time for many programs, partially because it guides the solver toward learning that the paths through the two programs are the same, e.g., in this example, the four paths in the C-language program are the same as the four paths through the Rust-language program, reducing the number of potential paths to four from sixteen.
In addition to helping the SMT solver match equivalent expressions between the two programs, this also helps the solver constrain the possible values on checked values. Because the check is done according to FP-equality, if the two expressions are shown to be equivalent, they are also shown to not produce NaN values. This enables a solver to reduce the expressions, which makes equivalence checking dependent expressions easier.
3.5 Equivalence Checking Extensions
We added equivalence checking extensions to SMACK, so they are available generally from the tool. These extensions are represented in the “equivalence extensions” part of fig. 1. The verifier_equiv_store functions are implemented in Boogie as shown in line 6, where $foeq is SMACK’s floating-point equivalence function corresponding to the SMT-LIB floating-point equality function, fp.eq. We use a global counter equiv_store_counter to track each value stored in this way. This value is incremented with each use, so as to maintain unique identifiers for retrieving values from verifier_equiv_store. This uses the theory of uninterpreted functions to allow the value to effectively be retrieved later.
The verifier_equiv_check functions are implemented in Boogie as shown in line 11 of listing 1.5. This adds an assertion that the value we are checking is equal to the value stored previously. This also uses a distinct counter to create unique identifiers for the values stored, and increases this value similar to the verifier_equiv_store function. The verifier_equiv_assume function is implemented similarly.
The use of automatically generated identifiers adds the requirement that uses of verifier_equiv_check must occur in the same order and number as uses of verifier_equiv_store. This is to allow for values inside loops to be checked without associating multiple values to stored valeus with the same identifier, potentially making the program trivially correct. It is because of this that we introduce the verifier_equiv_assume function to allow the user to verify the program, one check at a time.
3.6 Experiments Using SMT Equivalence
We investigated the impacts non-equivalence of NaN values has on showing equivalence of expressions in this work. We conducted an experiment where we replaced SMACK’s floating-point equivalence functions ($foeq) as being defined in terms of the SMT equivalence relation, i.e., by defining the $foeq functions to be the built-in = in Boogie. We used this alternate definition on some of the longer running benchmarks and found they made the assertion checks faster, sometimes by a factor of 10. This suggests that the proof obligation of the solver to show that the expressions are free from NaN values is responsible for some of the scalibility issues we encountered. It should be noted that while there may be some merit in treating NaN values as equivalent only when comparing equality of returned values, this transformation is not sound. For example, the expression x != x should be true when x is NaN. This means that control flow may be affected by treating NaN values as equivalent.
4 Experiments
For this work we chose to check the equivalence of the Rust implementation of a libm by the Rust developers. This library is a native, and idiomatic, Rust language reimplementation of the musl libm [12] library. The musl library provides C-language implementations of IEEE 754 math functions. The version of the musl library used in this work is version 1.1.19.
The Rust libm library is popular, with over fifty million downloads [8] as a package, and it has more than two hundred dependent packages [9]. This library features the use of for loops, and bit-wise floating-point and integer operations. The average SLoC is 66. Together, this library has real-world users, has clear targets for verifying equivalence and are programs of modest size and complexity, and are a good target on which to use SMACK to check for equivalence.
We have three categories of benchmarks: SMACK-library, basic and advanced. The SMACK-library category includes the functions in the math library that correspond to library functions provided by SMACK. Most of these functions are implemented in terms of SMT solver primitives. For these benchmarks, we also check for equivalence between the Rust, musl and SMACK versions of the functions. This gives an additional benefit of checking that SMACK’s math library is correct.
A second category of benchmarks are those that don’t require much intervention to verify. The third category is the advanced functions that required advanced methods to help the solver verify, meaning they would take at least an hour to verify.
The functions we worked to verify have several features that influence how hard a program will be for SMACK to verify, and how much manual annotation or modularization Functions that use bit-wise operations on floating point numbers tend to take longer to verify due to mixing theories, which may limit the solver’s ability to use floating-point theory specific reasoning. Functions with multiple branches tend to be slower to prove equivalence due to a lower ability for the solver to detect similar structure between the programs.
The basic benchmarks are shown in in table 1, the comparison between Rust and SMACK is in table 2 and the comparison between musl and SMACK is in table 3. The advanced benchmarks are shown in table 4.
Table 1.
Rust and musl equivalence results.
In the basic benchmarks, we can see that Corral with Z3 is the worst performing verifier due to its worse performance on many benchmarks. Boogie with Z3 is the most consistent performing verifier, verifying all but one benchmark in the time limit, which is 1200s.
In the comparisons between Rust and SMACK, we see that Boogie with CVC5 is the best performing verifier, solving all queries under the time limit of 1200s. Boogie with Z3 is the next best performing verifier, with Corral behind it. In the comparison between musl and SMACK, all three perform worse. This is notable because the SMACK library versions of these functions are written in C and Boogie. It may be that Rust code is easier for the solver to check equivalence with the Boogie code than it is for C.
In the advanced functions, Boogie with Z3 is the best performing solver, solving all benchmarks in their time limits. Boogie with CVC5 is the next best, solving two benchmarks, then Corral solving no benchmarks. This is partially due to using Boogie with Z3 for gauging the difficulty of equivalence checks, rather than doing the same with Corral or Boogie with CVC5.
All benchmarks were conducted on an Intel 12900ks@5.2GHz, with 16 performance cores and 64GB of memory. The versions of the software used were: Z3 version 4.8.12, CVC5 version 1.0.5, Boogie version 2.9.6 and Corral version 1.1.8. The benchmark suite is available at: https://github.com/keram88/libm-equiv. The modifications to SMACK are available at: https://github.com/smackers/smack/tree/equiv
Table 2.
Rust checked against SMACK.
Table 3.
musl checked against SMACK
Table 4.
Advanced benchmarks
5 Experience
Here we describe our experience finding two bugs from the libraries tested and our experience verifying a more complex function.
5.1 Bug discovery
Bug in the Rustnextafter function While performing equivalence checking between the MUSL and Rust implementations of the nextafter and nextafterf functions, we discovered an error in Rust’s error detection. We will focus on the double precision function nextafter which has the signature fn nextafter(x: f64, y: f64) ->f64 in Rust. The behavior of this function is to return the next floating-point number after x in the direction of y, e.g., nextafter(1.0, 2.0) returns the smallest floating-point number greater than 1.0 and nextafter(1.0, 0.0) returns the largest floating-point number less than 1.0.
The IEEE 754 standard for this function [13] specifies error reporting in the case of overflow or underflow. For example, consider the next floating point after 1.79E+3082 in the direction of inf. Since this is the largest, non-infinite floating point number, the result of nextafter(1.79E+308, inf) is inf. In this situation, the result overflows and the standard says that this should raise the FE_INEXACT and FE_UNDERFLOW floating-point exceptions, which may be checked by the user upon return.
We performed variable-wise, equivalence checking between the Rust and MUSL versions of this program. When we checked the variable e that appears on line 3 of both 1.8 and 1.9, we found that the two programs diverged for some inputs. This calculation is meant to calculate the exponent portion of the result in order to determine if underflow or overflow has occurred. However, the calculation in the Rust version is functionally equivalent to ux_i >>(52 & 0x7ff), which is effectively ux_i >>52. This leaves the sign bit of the exponent intact. The retained sign bit means that if the result is negative, the condition on line 6 of the Rust program will not detect overflow, meaning the computation on line 7 won’t generate the requisite inexact and overflow flags. We found a similar error in the calculation of the exponent in the 32-bit version of the function. We reported this issue [14] to the Rust developers and our fixes [15] to this issue was merged into the library.
An interesting aspect with respect to the Rust implementation is that currently Rust doesn’t support accessing the floating-point environment, meaning floating-point exceptions can’t currently be tested in the Rust implementation. This lack of support means that the current tests in the Rust library are not testing that the correct floating-point exception is raised. However, SMACK can show that there is a control flow divergence that would not be detected by the library’s tests. The current lack of testing means that if Rust supports the floating-point environment in at some future point, this incorrect code may be used assuming the code is correct. Because SMACK does not rely on the floating-point environment and can analyze a programs logical control flow, it can find a bug that testing cannot currently find.
Bug in SMACK’s math library
In the process of checking the SMACK library functions, we found a bug in SMACK’s modeling of the copysign functions in the fmod [19] functions. In listing 1.10, Clang translates the call to copysign to the LLVM intrinsic function @llvm.copysign.f32. SMACK translates this to an undefined uninterpreted-function in Boogie, as it does not model this intrinsic. This means the SMACK library fmod function will return a nondeterministic value, so it is not equivalent to the Rust or MUSL implementations. This does not apply to comparing the copysign functions in MUSL and Rust against SMACK since Clang does not modify SMACK’s definition. This bug has been submitted to the SMACK developers.
5.2 Verifying advanced functions
Here we describe our experiences verifying two functions from the advanced function category.
Verifying k_tanf The Rust libm function k_tanf (listing 1.12) corresponds to the __kernel_tandf (listing 1.11) function from musl libm. This function computes the tangent of x to 32-bit precision after range reduction. This function computes intermediate results in 64-bit precision, then returns the final result rounded to 32-bit precision. The function also does not use bit-wise operations.
The function appears straightforward to verify at first, however it takes more than an hour to check equivalence. Because of the time to verify, this function qualifies for advanced techniques. We applied the guided equivalence checking to verify this function.
To start, we check equivalence of r at line 17 in listing 1.12. As expected, this check times out since it the same as checking the equivalence of the functions themselves. We then add a check at line 4, while disabling the original check. This equality check can be verified in seconds. We can then add additional checks as shown in listing 1.12. While adding new checks, prior, verified checks can be converted to assumes. This allows for the function to be verified incrementally, meaning the current check is the only one being verified. This allows for feedback as to which checks are the most difficult, suggesting they need more work to verify.
Finally, once all checks are completed, any assumed equivalence checks can be returned to being equivalence checks. This allows the entire program to be checked without potentially adding faulty assumptions. In this example, verification can be completed in 27.6s with Boogie as the verifier and Z3 as the solver. This is similar to the total time taken to verify each check individually.
Additional verification techniques Occasionally, checking equivalence at the variable level can be too slow. The acosf function shown in listing 1.13 demonstrates such a case. Here, the expression at line 16 times out during verification. We can use a more advanced technique here and break the expression up by checking its sub-expressions. On line 14 we see that we can check the second part of the expression. We can continue to do this until we get reasonable verification times for each step. This helps the solver match up more of the sub-expressions, aiding in verification. For this program, this reduced verification time from more than an hour to taking about 15 minutes.
6 Related Work
In SymDiff [6] the authors looked at making a tool to compute a symbolic difference between two versions of the same program. This is done for C programs translated to the Boogie IVL. The SymDiff tool shows portions of code that have identical behavior, while finding semantic differences between the two programs. Because it operates on Boogie IVL, it is possible that SymDiff could be applied to cross-language comparisons.
In [5], a tool Keq based on the \(\mathbb {K}\) [17] which is used to check translation validation of compiled code. In this work, the authors develop a tool to check equivalence of translations from C to LLVM and from LLVM to x86 binary code. This tool uses the \(\mathbb {K}\) framework similarly to how SMACK or SymDiff use LLVM-IR/Boogie as a common language for verification. Because the use of this common language for verification, the authors are able to do cross-language equivalence checking on code generated by a compiler. This tool could be extended to support cross-language verification as in this work through the use of LLVM-IR.
In [10] the authors develop a tool called DiffKemp which is meant to test equivalence of changes made to the Linux kernel and musl. The tool works by trying to apply semantic-preserving change patterns to the code to try to show equivalence, rather than using a solver or an intermediate language. This relates to our guided equivalence checking in that in this work it is used to help the solver find the equivalence of the statements. The techniques in this work could benefit this tool by automatically identifying portions of the C and Rust programs which may be equivalent. This would allow for more scalable verification and less user intervention.
In [16] a tool based on the klee system, named uc-klee, is developed to check equivalence of functions. The klee system is a symbolic execution which is based on LLVM-IR. uc-klee adds equivalence checking extensions which automate making input variables non-deterministic and checking returned values are equal. This work is limited to checking equivalence of functions in one language, but it could be extended to support cross-language verification.
7 Conclusions
In this work we showed that SMACK could be used as a practical tool to do equivalence checking between C and Rust programs. We showed that SMACK can find bugs in these programs that may not otherwise be detectable. The floating-point and bit-vector theories proved to be a scalability challenge. We developed techniques to address some of these challenges with varying degrees of success. The equivalence checking extensions made to SMACK made the best improvements to SMACK’s scalability, albeit with additional labor. These extensions can be used in the future to do equivalence checking on new libraries, possibly ported between languages.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
We chose to write the verification harness in Rust due to a preference for its syntax. It is possible to do this the other way where the main function calls a Rust function from a C program instead.