Skip to main content
Top

2019 | Book

Formal Verification of Floating-Point Hardware Design

A Mathematical Approach

insite
SEARCH

About this book

This is the first book to focus on the problem of ensuring the correctness of floating-point hardware designs through mathematical methods. Formal Verification of Floating-Point Hardware Design advances a verification methodology based on a unified theory of register-transfer logic and floating-point arithmetic that has been developed and applied to the formal verification of commercial floating-point units over the course of more than two decades, during which the author was employed by several major microprocessor design companies.

The book consists of five parts, the first two of which present a rigorous exposition of the general theory based on the first principles of arithmetic. Part I covers bit vectors and the bit manipulation primitives, integer and fixed-point encodings, and bit-wise logical operations. Part II addresses the properties of floating-point numbers, the formats in which they are encoded as bit vectors, and the various modes of floating-point rounding. In Part III, the theory is extended to the analysis of several algorithms and optimization techniques that are commonly used in commercial implementations of elementary arithmetic operations. As a basis for the formal verification of such implementations, Part IV contains high-level specifications of correctness of the basic arithmetic instructions of several major industry-standard floating-point architectures, including all details pertaining to the handling of exceptional conditions. Part V illustrates the methodology, applying the preceding theory to the comprehensive verification of a state-of-the-art commercial floating-point unit.

All of these results have been formalized in the logic of the ACL2 theorem prover and mechanically checked to ensure their correctness. They are presented here, however, in simple conventional mathematical notation. The book presupposes no familiarity with ACL2, logic design, or any mathematics beyond basic high school algebra. It will be of interest to verification engineers as well as arithmetic circuit designers who appreciate the value of a rigorous approach to their art, and is suitable as a graduate text in computer arithmetic.

Table of Contents

Frontmatter

Part I

Frontmatter
Chapter 1. Basic Arithmetic Functions
Abstract
This chapter examines the properties of the floor, ceiling, and modulus functions, which are central to our formulation of the RTL primitives as well as the floating-point rounding modes. Thus, their definitions and properties are prerequisite to a reading of the subsequent chapters. We also define and investigate the properties of a function that truncates to a specified number of fractional bits, which is related to the floor and is relevant to the analysis of fixed-point encodings, as discussed in Sect. 2.​5.
David M. Russinoff
Chapter 2. Bit Vectors
Abstract
We shall use the term bit vector as a synonym of integer. Thus, a bit vector may be positive, negative, or zero. However, only a nonnegative bit vector may be associated with a width:
David M. Russinoff
Chapter 3. Logical Operations
Abstract
In this chapter, we define and analyze the four basic logical operations: the unary “not”, or complement, and the binary “and”, “inclusive or” and “exclusive or”. These are commonly known as bit-wise operations, as each one may be computed by performing a certain operation on each bit of its argument (in the unary case) or each pair of corresponding bits of its arguments (for binary operations). For example, the logical “and” https://static-content.springer.com/image/chp%3A10.1007%2F978-3-319-95513-1_3/465652_1_En_3_IEq1_HTML.gif of two bit vectors may be specified in a bit-wise manner as the bit vector z such that for all \(k \in \mathbb {N}\), z[k] = 1 iff x[k] = y[k] = 1.
David M. Russinoff

Part II

Frontmatter
Chapter 4. Floating-Point Numbers
Abstract
The designation floating-point number is a relative term, used to refer to a rational number that is representable with respect to a particular format, as described in Chap. 5. In this chapter, we discuss properties of real numbers that are relevant to these representations.
David M. Russinoff
Chapter 5. Floating-Point Formats
Abstract
A floating-point format is a scheme for representing a number as a bit vector consisting of three fields corresponding to its sign, exponent, and significand. In this chapter, we present a classification of such formats, including those prescribed by IEEE Standard 754 [9], and examine the characteristics of the numbers that they represent.
David M. Russinoff
Chapter 6. Rounding
Abstract
The objective of floating-point rounding is the approximation of a real number by one that is representable in a given floating-point format. Thus, a rounding mode is a function that computes an n-exact value \(\mathcal {R}(x,n)\), given a real number x and precision n. In this chapter, we investigate the properties of a variety of rounding modes, including those that are prescribed by the IEEE standard as well as others that are commonly used in implementations of floating-point operations.
David M. Russinoff
Chapter 7. IEEE-Compliant Square Root
Abstract
Many of the preceding results are propositions pertaining to real variables, which are formalized by ACL2 events in which these variables are restricted to the rational domain. Many of the lemmas of this chapter similarly apply to arbitrary real numbers, but in light of our present focus, these results are formulated to correspond more closely with their formal versions. Apart from the informal discussion immediately below, the lemmas themselves contain no references to the real numbers or the square root function.
David M. Russinoff

Part III

Frontmatter
Chapter 8. Addition
Abstract
The problem of computer addition comprises two distinct tasks: (1) the design of an integer adder, and (2) its application to the addition of floating-point numbers. Integer addition is a nontrivial but relatively simple operation that can be performed for ordinary bit-widths within a single clock cycle. Consequently, this operation is generally treated by the RTL designer as a primitive operation to be implemented by a logic synthesis tool. This amounts to a selection from a library of pre-defined adder modules, based on width and timing requirements. Thus, our treatment of this topic, Sect. 8.1, is limited to a brief introduction, focusing on basic concepts that are relevant to later chapters. The remaining two sections of this chapter deal with optimization techniques that are commonly used in the normalization and rounding of floating-point sums.
David M. Russinoff
Chapter 9. Multiplication
Abstract
While the RTL implementation of integer multiplication is more complex than that of integer addition, the extended problem of floating-point multiplication does not present any significant difficulties that have not already been addressed, and is, in fact, much simpler than floating-point addition. The focus of this chapter, therefore, is the multiplication of natural numbers. All of our results can be readily extended to signed integers.
David M. Russinoff
Chapter 10. SRT Division and Square Root
Abstract
The simplest and most common approach to computer division is digit recurrence, an iterative process whereby at each step, a multiple of the divisor is subtracted from the current remainder and the quotient is updated accordingly by appending a fixed number of bits k, determined by the underlying radix, r = 2k. Thus, quotient convergence is linear, resulting in fairly high latencies of high-precision operations for the most common radices, r = 2, 4, and 8.
David M. Russinoff
Chapter 11. FMA-Based Division
Abstract
Multiplicative division algorithms are typically based on a sequence of approximations of the reciprocal of the divisor b, derived by the Newton-Raphson method.
David M. Russinoff

Part IV

Frontmatter
Chapter 12. SSE Floating-Point Instructions
Abstract
The SSE floating-point instructions were introduced by Intel in 1998 and have continually expanded ever since. They operate on single-precision or double-precision data (Definition 5.​3) residing in the 128-bit XMM registers or the 256-bit YMM registers. Some SSE instructions are packed, i.e., they partition their operands into several floating-point encodings to be processed in parallel; others are scalar, performing a single operation, usually on data residing in the low-order bits of their register arguments. The specifications presented in this chapter apply to both scalar and packed instructions that perform the operations of addition, multiplication, division, square root extraction, and FMA.
David M. Russinoff
Chapter 13. x87 Instructions
Abstract
The x87 instruction set was Intel’s first floating-point architecture, introduced in 1981 with the 8087 hardware coprocessor. The architecture provides an array of eight 80-bit data registers, which is managed as a circular stack. Each numerical operand is located either in an x87 data register or in memory and is interpreted according to one of the data formats defined by Definition 5.​3. Data register contents are always interpreted according to the double extended precision format (EP). Memory operands may be encoded in the single (SP), double (DP), or double extended precision format. Numerical results are written only to the data registers in the EP format. The architecture also provides distinct 16-bit control and status registers, the FCW and the FSW, corresponding to the single SSE register MXCSR.
David M. Russinoff
Chapter 14. Arm Floating-Point Instructions
Abstract
The first Arm Floating-Point Accelerator, which appeared in 1993, resembled the x87 coprocessor in its use of an 80-bit EP register file. This was succeeded by the Vector Floating-Point (VFP) architecture, which included 64-bit registers implementing the single-, double-, and half-precision data formats. The NEON Advanced SIMD extension later added 128-bit instructions for media and signal processing applications.
David M. Russinoff

Part V

Frontmatter
Chapter 15. The Modeling Language
Abstract
The design of our modeling language—its features as well as the restrictions that we impose on it—is driven by the following goals:
David M. Russinoff
Chapter 16. Double-Precision Multiplication
Abstract
The first illustration of our verification methodology is a proof of correctness of a double-precision floating-point multiplier that supports both the binary FMUL instruction and the ternary FMA. In a typical implementation of the latter, addition is combined with multiplication in a single pipeline by inserting the addend into the multiplier’s compression tree as an additional partial product. The resulting FMA latency is somewhat greater than that of a simple multiplication but less than two successive operations. One drawback of this integrated approach is that the computation cannot be initiated until all three operands are available. Another is that in order to conserve area, hardware is typically shared with the operations of pure multiplication and addition, each of which is implemented as a degenerate case of FMA, resulting in increased latencies.
David M. Russinoff
Chapter 17. Double-Precision Addition and FMA
Abstract
The double-precision addition module fadd64 supports both the binary FADD instruction and, in concert with the multiplier, the ternary FMA instruction. Thus, both operands may be simple DP encodings, or one of them may be an unrounded product in the format of the data output of fmul64 as described in Chap. 16.
David M. Russinoff
Chapter 18. Multi-Precision Radix-4 SRT Division
Abstract
Unlike the dedicated double-precision multiplier and adder described in the preceding two chapters, a single module of our FPU performs floating-point division and square root extraction at all three precisions: double, single, and half. This module is modeled, however, by two separate functions, fdiv64 and fsqrt64, the first of which, displayed in Appendix D, is the subject of this chapter. This function is based on the implementation of the minimally redundant radix-4 case of SRT division that is addressed by Lemma 10.​7 of Sect. 10.​2.
David M. Russinoff
Chapter 19. Multi-Precision Radix-4 SRT Square Root
Abstract
Finally, we present the function fsqrt64, which performs double-, single-, and half-precision square root extraction. This function, which is listed in Appendix E, is based on an implementation of the minimally redundant radix-4 case of SRT square root extraction characterized by Lemma 10.​15 of Sect. 10.​5. As noted in Chap. 18, it is derived from the same RTL module as the function fdiv64. The design shares hardware between the two operations for post-processing; therefore, the auxiliary functions computeQ, rounder, and final are shared by the two models.
David M. Russinoff
Backmatter
Metadata
Title
Formal Verification of Floating-Point Hardware Design
Author
David M. Russinoff
Copyright Year
2019
Electronic ISBN
978-3-319-95513-1
Print ISBN
978-3-319-95512-4
DOI
https://doi.org/10.1007/978-3-319-95513-1