Skip to main content
Top

2019 | OriginalPaper | Chapter

6. High-Performance FFT Algorithms

Author : Daisuke Takahashi

Published in: Fast Fourier Transform Algorithms for Parallel Computers

Publisher: Springer Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This chapter presents high-performance FFT algorithms. First, the four-step FFT algorithm and five-step FFT algorithm are described. Next, the six-step FFT algorithm and blocked six-step FFT algorithm are explained. Then, nine-step FFT algorithm and recursive six-step FFT, and blocked multidimensional FFT algorithms are described. Finally, FFT algorithms suitable for fused multiply–add instructions and FFT algorithms for SIMD instructions are explained.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Bailey, D.H.: FFTs in external or hierarchical memory. J. Supercomput. 4, 23–35 (1990)CrossRef Bailey, D.H.: FFTs in external or hierarchical memory. J. Supercomput. 4, 23–35 (1990)CrossRef
2.
go back to reference Brigham, E.O.: The Fast Fourier Transform and Its Applications. Prentice-Hall, Englewood Cliffs, NJ (1988) Brigham, E.O.: The Fast Fourier Transform and Its Applications. Prentice-Hall, Englewood Cliffs, NJ (1988)
3.
go back to reference Burrus, C.S., Eschenbacher, P.W.: An in-place, in-order prime factor FFT algorithm. IEEE Trans. Acoust. Speech Signal Process. ASSP-29, 806–817 (1981)CrossRef Burrus, C.S., Eschenbacher, P.W.: An in-place, in-order prime factor FFT algorithm. IEEE Trans. Acoust. Speech Signal Process. ASSP-29, 806–817 (1981)CrossRef
4.
go back to reference Franchetti, F., Karner, H., Kral, S., Ueberhuber, C.W.: Architecture independent short vector FFTs. In: Proceedings of 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2001), vol. 2, pp. 1109–1112 (2001) Franchetti, F., Karner, H., Kral, S., Ueberhuber, C.W.: Architecture independent short vector FFTs. In: Proceedings of 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2001), vol. 2, pp. 1109–1112 (2001)
5.
go back to reference Frigo, M., Johnson, S.G.: The design and implementation of FFTW3. Proc. IEEE 93, 216–231 (2005)CrossRef Frigo, M., Johnson, S.G.: The design and implementation of FFTW3. Proc. IEEE 93, 216–231 (2005)CrossRef
6.
go back to reference Goedecker, S.: Fast radix 2, 3, 4, and 5 kernels for fast Fourier transformations on computers with overlapping multiply-add instructions. SIAM J. Sci. Comput. 18, 1605–1611 (1997)MathSciNetCrossRef Goedecker, S.: Fast radix 2, 3, 4, and 5 kernels for fast Fourier transformations on computers with overlapping multiply-add instructions. SIAM J. Sci. Comput. 18, 1605–1611 (1997)MathSciNetCrossRef
7.
go back to reference Hegland, M.: A self-sorting in-place fast Fourier transform algorithm suitable for vector and parallel processing. Numer. Math. 68, 507–547 (1994)MathSciNetCrossRef Hegland, M.: A self-sorting in-place fast Fourier transform algorithm suitable for vector and parallel processing. Numer. Math. 68, 507–547 (1994)MathSciNetCrossRef
10.
go back to reference Karner, H., Auer, M., Ueberhuber, C.W.: Multiply-add optimized FFT kernels. Math. Models Methods Appl. Sci. 11, 105–117 (2001)MathSciNetCrossRef Karner, H., Auer, M., Ueberhuber, C.W.: Multiply-add optimized FFT kernels. Math. Models Methods Appl. Sci. 11, 105–117 (2001)MathSciNetCrossRef
11.
go back to reference Kolba, D.P., Parks, T.W.: A prime factor FFT algorithm using high-speed convolution. IEEE Trans. Acoust. Speech Signal Process. ASSP-25, 281–294 (1977)CrossRef Kolba, D.P., Parks, T.W.: A prime factor FFT algorithm using high-speed convolution. IEEE Trans. Acoust. Speech Signal Process. ASSP-25, 281–294 (1977)CrossRef
12.
go back to reference Kral, S., Franchetti, F., Lorenz, J., Ueberhuber, C.W.: SIMD vectorization of straight line FFT code. In: Proceedings of 9th International Euro-Par Conference (Euro-Par 2003). Lecture Notes in Computer Science, vol. 2790, pp. 251–260. Springer, Berlin (2003)CrossRef Kral, S., Franchetti, F., Lorenz, J., Ueberhuber, C.W.: SIMD vectorization of straight line FFT code. In: Proceedings of 9th International Euro-Par Conference (Euro-Par 2003). Lecture Notes in Computer Science, vol. 2790, pp. 251–260. Springer, Berlin (2003)CrossRef
13.
14.
go back to reference Linzer, E.N., Feig, E.: Implementation of efficient FFT algorithms on fused multiply-add architectures. IEEE Trans. Signal Process. 41, 93–107 (1993)CrossRef Linzer, E.N., Feig, E.: Implementation of efficient FFT algorithms on fused multiply-add architectures. IEEE Trans. Signal Process. 41, 93–107 (1993)CrossRef
15.
go back to reference McFarlin, D.S., Arbatov, V., Franchetti, F., Püschel, M.: Automatic SIMD vectorization of fast Fourier transforms for the Larrabee and AVX instruction sets. In: Proceedings of 25th International Conference on Supercomputing (ICS’11), pp. 265–274 (2011) McFarlin, D.S., Arbatov, V., Franchetti, F., Püschel, M.: Automatic SIMD vectorization of fast Fourier transforms for the Larrabee and AVX instruction sets. In: Proceedings of 25th International Conference on Supercomputing (ICS’11), pp. 265–274 (2011)
16.
go back to reference Nadehara, K., Miyazaki, T., Kuroda, I.: Radix-4 FFT implementation using SIMD multimedia instructions. In: Proceedings of 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’99), vol. 4, pp. 2131–2134 (1999) Nadehara, K., Miyazaki, T., Kuroda, I.: Radix-4 FFT implementation using SIMD multimedia instructions. In: Proceedings of 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’99), vol. 4, pp. 2131–2134 (1999)
17.
go back to reference Nussbaumer, H.J.: Fast Fourier Transform and Convolution Algorithms, updated edn. Springer-Verlag, New York (1982). Second corrected and updated editionCrossRef Nussbaumer, H.J.: Fast Fourier Transform and Convolution Algorithms, updated edn. Springer-Verlag, New York (1982). Second corrected and updated editionCrossRef
18.
go back to reference Popovici, D.T., Franchetti, F., Low, T.M.: Mixed data layout kernels for vectorized complex arithmetic. In: Proceedings of 2017 IEEE High Performance Extreme Computing Conference (HPEC 2017) (2017) Popovici, D.T., Franchetti, F., Low, T.M.: Mixed data layout kernels for vectorized complex arithmetic. In: Proceedings of 2017 IEEE High Performance Extreme Computing Conference (HPEC 2017) (2017)
19.
go back to reference Püschel, M., Moura, J.M.F., Johnson, J.R., Padua, D., Veloso, M.M., Singer, B.W., Xiong, J., Franchetti, F., Gačić, A., Voronenko, Y., Chen, K., Johnson, R.W., Rizzolo, N.: SPIRAL: code generation for DSP transforms. Proc. IEEE 93, 232–275 (2005)CrossRef Püschel, M., Moura, J.M.F., Johnson, J.R., Padua, D., Veloso, M.M., Singer, B.W., Xiong, J., Franchetti, F., Gačić, A., Voronenko, Y., Chen, K., Johnson, R.W., Rizzolo, N.: SPIRAL: code generation for DSP transforms. Proc. IEEE 93, 232–275 (2005)CrossRef
20.
go back to reference Rodriguez, P.: A radix-2 FFT algorithm for modern single instruction multiple data (SIMD) architectures. In: Proceedings of 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2002), vol. 3, pp. 3220–3223 (2002) Rodriguez, P.: A radix-2 FFT algorithm for modern single instruction multiple data (SIMD) architectures. In: Proceedings of 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2002), vol. 3, pp. 3220–3223 (2002)
21.
go back to reference Takahashi, D.: A new radix-6 FFT algorithm suitable for multiply-add instruction. In: Proceedings of 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2000), vol. 6, pp. 3343–3346 (2000) Takahashi, D.: A new radix-6 FFT algorithm suitable for multiply-add instruction. In: Proceedings of 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2000), vol. 6, pp. 3343–3346 (2000)
22.
go back to reference Takahashi, D.: A blocking algorithm for parallel 1-D FFT on shared-memory parallel computers. In: Proceedings of 6th International Conference on Applied Parallel Computing (PARA 2002). Lecture Notes in Computer Science, vol. 2367, pp. 380–389. Springer, Berlin (2002) Takahashi, D.: A blocking algorithm for parallel 1-D FFT on shared-memory parallel computers. In: Proceedings of 6th International Conference on Applied Parallel Computing (PARA 2002). Lecture Notes in Computer Science, vol. 2367, pp. 380–389. Springer, Berlin (2002)
23.
24.
go back to reference Takahashi, D.: A radix-16 FFT algorithm suitable for multiply-add instruction based on Goedecker method. In: Proceedings of 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2003), vol. 2, pp. 665–668 (2003) Takahashi, D.: A radix-16 FFT algorithm suitable for multiply-add instruction based on Goedecker method. In: Proceedings of 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2003), vol. 2, pp. 665–668 (2003)
25.
go back to reference Takahashi, D.: An implementation of parallel 1-D FFT using SSE3 instructions on dual-core processors. In: Proceedings of 8th International Workshop on State of the Art in Scientific Computing (PARA 2006). Lecture Notes in Computer Science, vol. 4699, pp. 1178–1187. Springer, Berlin (2007) Takahashi, D.: An implementation of parallel 1-D FFT using SSE3 instructions on dual-core processors. In: Proceedings of 8th International Workshop on State of the Art in Scientific Computing (PARA 2006). Lecture Notes in Computer Science, vol. 4699, pp. 1178–1187. Springer, Berlin (2007)
26.
go back to reference Takahashi, D.: An implementation of parallel 2-D FFT using Intel AVX instructions on multi-core processors. In: Proceedings of 12th International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP 2012), Part II. Lecture Notes in Computer Science, vol. 7440, pp. 197–205. Springer, Berlin (2012)CrossRef Takahashi, D.: An implementation of parallel 2-D FFT using Intel AVX instructions on multi-core processors. In: Proceedings of 12th International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP 2012), Part II. Lecture Notes in Computer Science, vol. 7440, pp. 197–205. Springer, Berlin (2012)CrossRef
27.
go back to reference Takahashi, D., Boku, T., Sato, M.: A blocking algorithm for parallel 1-D FFT on clusters of PCs. In: Proceedings of 8th International Euro-Par Conference (Euro-Par 2002). Lecture Notes in Computer Science, vol. 2400, pp. 691–700. Springer, Berlin (2002)CrossRef Takahashi, D., Boku, T., Sato, M.: A blocking algorithm for parallel 1-D FFT on clusters of PCs. In: Proceedings of 8th International Euro-Par Conference (Euro-Par 2002). Lecture Notes in Computer Science, vol. 2400, pp. 691–700. Springer, Berlin (2002)CrossRef
28.
go back to reference Takahashi, D., Uno, A., Yokokawa, M.: An implementation of parallel 1-D FFT on the K computer. In: Proceedings of 2012 IEEE 14th International Conference on High Performance Computing and Communications (HPCC-2012), pp. 344–350 (2012) Takahashi, D., Uno, A., Yokokawa, M.: An implementation of parallel 1-D FFT on the K computer. In: Proceedings of 2012 IEEE 14th International Conference on High Performance Computing and Communications (HPCC-2012), pp. 344–350 (2012)
29.
go back to reference Van Loan, C.: Computational Frameworks for the Fast Fourier Transform. SIAM Press, Philadelphia, PA (1992)CrossRef Van Loan, C.: Computational Frameworks for the Fast Fourier Transform. SIAM Press, Philadelphia, PA (1992)CrossRef
Metadata
Title
High-Performance FFT Algorithms
Author
Daisuke Takahashi
Copyright Year
2019
Publisher
Springer Singapore
DOI
https://doi.org/10.1007/978-981-13-9965-7_6

Premium Partner