main-content

Following an introduction to the basis of the fast Fourier transform (FFT), this book focuses on the implementation details on FFT for parallel computers. FFT is an efficient implementation of the discrete Fourier transform (DFT), and is widely used for many applications in engineering, science, and mathematics. Presenting many algorithms in pseudo-code and a complexity analysis, this book offers a valuable reference guide for graduate students, engineers, and scientists in the field who wish to apply FFT to large-scale problems.

Parallel computation is becoming indispensable in solving the large-scale problems increasingly arising in a wide range of applications. The performance of parallel supercomputers is steadily improving, and it is expected that a massively parallel system with hundreds of thousands of compute nodes equipped with multi-core processors and accelerators will be available in the near future. Accordingly, the book also provides up-to-date computational techniques relevant to the FFT in state-of-the-art parallel computers.

Following the introductory chapter, Chapter 2 introduces readers to the DFT and the basic idea of the FFT. Chapter 3 explains mixed-radix FFT algorithms, while Chapter 4 describes split-radix FFT algorithms. Chapter 5 explains multi-dimensional FFT algorithms, Chapter 6 presents high-performance FFT algorithms, and Chapter 7 addresses parallel FFT algorithms for shared-memory parallel computers. In closing, Chapter 8 describes parallel FFT algorithms for distributed-memory parallel computers.

Chapter 1. Introduction

Abstract
The fast Fourier transform (FFT) is an efficient implementation of the discrete Fourier transform (DFT). The FFT is widely used in numerous applications in engineering, science, and mathematics. This chapter describes the history of the FFT briefly and presents an introduction to this book.
Daisuke Takahashi

Chapter 2. Fast Fourier Transform

Abstract
This chapter introduces the definition of the DFT and the basic idea of the FFT. Then, the Cooley–Tukey FFT algorithm, bit-reversal permutation, and Stockham FFT algorithm are explained. Finally, FFT algorithm for real data is described.
Daisuke Takahashi

Chapter 3. Mixed-Radix FFT Algorithms

Abstract
This chapter presents Mixed-Radix FFT Algorithms. First, two-dimensional formulation of DFT is given. Next, radix-3, 4, 5, and 8 FFT algorithms are described.
Daisuke Takahashi

Chapter 4. Split-Radix FFT Algorithms

Abstract
This chapter presents split-radix FFT algorithms. First, split-radix FFT algorithm is given. Next, extended split-radix FFT algorithm is described.
Daisuke Takahashi

Chapter 5. Multidimensional FFT Algorithms

Abstract
In this chapter, two- and three-dimensional FFT algorithms are explained as examples of multidimensional FFT algorithms. As multidimensional FFT algorithms, there are a row–column algorithm and a vector-radix FFT algorithm (Rivard, IEEE Trans. Acoust. Speech Signal Process. 25(3), 250–252, 1977 [1]). We describe multidimensional FFT algorithms based on the row–column algorithm.
Daisuke Takahashi

Chapter 6. High-Performance FFT Algorithms

Abstract
This chapter presents high-performance FFT algorithms. First, the four-step FFT algorithm and five-step FFT algorithm are described. Next, the six-step FFT algorithm and blocked six-step FFT algorithm are explained. Then, nine-step FFT algorithm and recursive six-step FFT, and blocked multidimensional FFT algorithms are described. Finally, FFT algorithms suitable for fused multiply–add instructions and FFT algorithms for SIMD instructions are explained.
Daisuke Takahashi

Chapter 7. Parallel FFT Algorithms for Shared-Memory Parallel Computers

Abstract
This chapter presents parallel FFT algorithms for shared-memory parallel computers. First, the implementation of parallel one-dimensional FFT on shared-memory parallel computers is described. Next, optimizing parallel FFTs for manycore processors and its performance are explained.
Daisuke Takahashi

Chapter 8. Parallel FFT Algorithms for Distributed-Memory Parallel Computers

Abstract
This chapter presents parallel FFT algorithms for distributed-memory parallel computers. First, implementation of parallel FFTs in distributed-memory parallel computers and computation–communication overlap for parallel one-dimensional FFT are explained. Next, parallel three-dimensional FFT using two-dimensional decomposition is described. Then, the optimization of all-to-all communication on multicore cluster systems is explained. Finally, parallel one-dimensional FFT in a GPU cluster is described.
Daisuke Takahashi