Skip to main content
Erschienen in: The Journal of Supercomputing 1/2023

02.07.2022

4-Valued spectral transforms implementation on GPU with Tensor Cores

verfasst von: Ivica Marković, Suzana Stojković

Erschienen in: The Journal of Supercomputing | Ausgabe 1/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

A spectral transform maps a function from one domain into an appropriate function in another domain where certain characteristics of the function are clearly visible. Spectral transforms have great importance in signal analysis, image processing, logic design, etc. The main problem with spectral transforms is their exponential computational complexity. In the case of discrete functions, spectral transform computation comes down to multiplying the transform matrix and the truth vector of the function. Most of the previously developed algorithms for spectral transforms computation are based on the fast Fourier transform algorithm, some use a compact representation of the functions (such as decision diagrams), and some use special single instruction multiple data hardware structures (such as graphics processing units). In the last years, a special type of graphics processing units with Tensor Cores has been developed for matrix multiplication. These units usually support matrix operations on limited data types and matrix dimensions. In this paper, we propose algorithms for 4-valued Reed–Muller–Fourier and Vilenkin–Chrestenson transforms on the Tensor Cores hardware. Our solution is a customization of the Cooley–Tuckey algorithm for execution on the hardware with specified limitations. Computation times of spectral transforms by the proposed algorithm are compared with computation times of the same transforms on a central processing unit by using serial and parallel algorithms, and on a standard graphics processing units. The described experiments showed that, for a large number of variables, both implementations that are executed on graphics processing units are significantly more efficient than those that are executed on central processing unit. If only implementations on graphics processing units are compared, for the functions of 14 variables, the Tensor Cores implementation of the Reed–Muller–Fourier transform is 2.03 times faster, and the implementation of the Vilenkin–Chrestenson transform is 1.5 times faster. Poorer results obtained for the Vilenkin–Chrestenson transform are due to the limited set of data types provided by the NVIDIA Turing Graphics Processing Units that were used in the experiments. Therefore, one integer spectral coefficient is represented by 4-byte values. Regardless, the proposed algorithms and the Tensor Cores architecture have proven to be a good solution for the spectral transforms calculations.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Karpovsky MG, Stankovic RS, Astola JT (2008) Spectral logic and its applications for the design of digital devices. Wiley Karpovsky MG, Stankovic RS, Astola JT (2008) Spectral logic and its applications for the design of digital devices. Wiley
2.
Zurück zum Zitat Thornton MA, Drechsler R, Miller DM (2001) Spectral techniques in VLSI CAD. Springer Thornton MA, Drechsler R, Miller DM (2001) Spectral techniques in VLSI CAD. Springer
3.
Zurück zum Zitat Rao KR, Narasimhan MA, Revuluri K (1975) Image data processing by Hadamard-Haar transform. IEEE Trans Comput C–24(9):888–896CrossRefMATH Rao KR, Narasimhan MA, Revuluri K (1975) Image data processing by Hadamard-Haar transform. IEEE Trans Comput C–24(9):888–896CrossRefMATH
4.
Zurück zum Zitat Yaroslavsky LP (2014) Fast transforms in image processing: compression, restoration, and resampling. Adv Electr Eng 2014:276241 Yaroslavsky LP (2014) Fast transforms in image processing: compression, restoration, and resampling. Adv Electr Eng 2014:276241
5.
Zurück zum Zitat Oczeretko E, Borowska M, Brzozowska E, Pawlinski B, Borusiewicz A, Gajewski Z (2015) Walsh-Hadamard spectral analysis of signals representing bioelectrical activity of the reproductive tract in pigs. In: IEEE 15th International Conference on Bioinformatics and Bioengineering (BIBE), Belgrade, pp 1–5 Oczeretko E, Borowska M, Brzozowska E, Pawlinski B, Borusiewicz A, Gajewski Z (2015) Walsh-Hadamard spectral analysis of signals representing bioelectrical activity of the reproductive tract in pigs. In: IEEE 15th International Conference on Bioinformatics and Bioengineering (BIBE), Belgrade, pp 1–5
6.
Zurück zum Zitat Jabeen D, Monir G, Azim F (2015) Sequency Domain Signal Processing Using Complex Hadamard Transform. Circuits, Systems, and Signal Processing, published online 04 August 2015, pp 1–11 Jabeen D, Monir G, Azim F (2015) Sequency Domain Signal Processing Using Complex Hadamard Transform. Circuits, Systems, and Signal Processing, published online 04 August 2015, pp 1–11
7.
Zurück zum Zitat Miller DM (1994) Spectral transformation of multiple-valued decision diagrams. In: 24th International Symposium on Multiple-Valued Logic (ISMVL’94), Boston, MA, USA, pp 89–96 Miller DM (1994) Spectral transformation of multiple-valued decision diagrams. In: 24th International Symposium on Multiple-Valued Logic (ISMVL’94), Boston, MA, USA, pp 89–96
8.
Zurück zum Zitat Thornton M, Drechsler R (2001) Spectral decision diagrams using graph transformations. In: Proceedings Design, Automation and Test in Europe. Conference and Exhibition 2001, pp 713–717 Thornton M, Drechsler R (2001) Spectral decision diagrams using graph transformations. In: Proceedings Design, Automation and Test in Europe. Conference and Exhibition 2001, pp 713–717
9.
Zurück zum Zitat Townsend WJ, Thornton MA, Drechsler R, Miller DM (2002) Computing Walsh, arithmetic, and Reed-Muller spectral decision diagrams using graph transformations. In: Proceedings of the 12th ACM Great Lakes symposium on VLSI, pp 178–183 Townsend WJ, Thornton MA, Drechsler R, Miller DM (2002) Computing Walsh, arithmetic, and Reed-Muller spectral decision diagrams using graph transformations. In: Proceedings of the 12th ACM Great Lakes symposium on VLSI, pp 178–183
10.
Zurück zum Zitat Stanković RS, Falkowski BJ (2002) Spectral transforms calculation through decision diagrams. VLSI Des 14(1):5–12MathSciNetCrossRef Stanković RS, Falkowski BJ (2002) Spectral transforms calculation through decision diagrams. VLSI Des 14(1):5–12MathSciNetCrossRef
11.
Zurück zum Zitat Jankovic D, Stankovic RS, Drechsler R (2001) Decision diagram method for calculation of pruned Walsh transform. IEEE Trans Comput 50(2):147–157CrossRef Jankovic D, Stankovic RS, Drechsler R (2001) Decision diagram method for calculation of pruned Walsh transform. IEEE Trans Comput 50(2):147–157CrossRef
12.
Zurück zum Zitat Andrade J, Falcao G, Silva V (2014) Optimized Fast Walsh-Hadamard Transform on GPUs for non-binary LDPC decoding. Parallel Comput 40(9):449–453MathSciNetCrossRef Andrade J, Falcao G, Silva V (2014) Optimized Fast Walsh-Hadamard Transform on GPUs for non-binary LDPC decoding. Parallel Comput 40(9):449–453MathSciNetCrossRef
13.
Zurück zum Zitat Pereira PMM, Domingues P, Rodrigues NMM, Faria SM, Fernandes G (2016) Optimized fast Walsh-Hadamard transform on OpenCL-GPU and OpenCL-CPU. In: Sixth International Conference on Image Processing Theory, Tools and Applications (IPTA), pp 1–6 Pereira PMM, Domingues P, Rodrigues NMM, Faria SM, Fernandes G (2016) Optimized fast Walsh-Hadamard transform on OpenCL-GPU and OpenCL-CPU. In: Sixth International Conference on Image Processing Theory, Tools and Applications (IPTA), pp 1–6
14.
Zurück zum Zitat Bikov D, Bouyukliev I (2018) Parallel fast Walsh transform algorithm and its implementation with CUDA on GPUs. Cybern Inf Technol 18(5):21–43MathSciNet Bikov D, Bouyukliev I (2018) Parallel fast Walsh transform algorithm and its implementation with CUDA on GPUs. Cybern Inf Technol 18(5):21–43MathSciNet
15.
Zurück zum Zitat Gajić DB, Stanković RS (2011) GPU accelerated computation of fast spectral transforms. Facta Universitatis - Series: Electronics and Energetics (Special issue Reed-Muller) 24(3):483–499 Gajić DB, Stanković RS (2011) GPU accelerated computation of fast spectral transforms. Facta Universitatis - Series: Electronics and Energetics (Special issue Reed-Muller) 24(3):483–499
16.
Zurück zum Zitat Stankovic RS, Astola J, Moraga C, Gajic D (2014) Constant geometry algorithms for Galois field expressions and their implementation on GPUs. In: 2014 IEEE 44th International Symposium on Multiple-valued Logic, Bremen, pp 79–84 Stankovic RS, Astola J, Moraga C, Gajic D (2014) Constant geometry algorithms for Galois field expressions and their implementation on GPUs. In: 2014 IEEE 44th International Symposium on Multiple-valued Logic, Bremen, pp 79–84
17.
Zurück zum Zitat Gajić DB, Stanković RS (2015) Computation of the Vilenkin-Chrestenson transform on a GPU. J Multiple-Valued Logic Soft Comput 24(1–4):317–340MathSciNetMATH Gajić DB, Stanković RS (2015) Computation of the Vilenkin-Chrestenson transform on a GPU. J Multiple-Valued Logic Soft Comput 24(1–4):317–340MathSciNetMATH
19.
Zurück zum Zitat Morchid M (2018) Parsimonious memory unit for recurrent neural networks with application to natural language processing. Neurocomputing 314:48–64CrossRef Morchid M (2018) Parsimonious memory unit for recurrent neural networks with application to natural language processing. Neurocomputing 314:48–64CrossRef
20.
Zurück zum Zitat Chmielewski Ł, Weissbart L, On Reverse Engineering Neural Network Implementation on GPU. In: Applied Cryptography and Network Security Workshops. ACNS 2021. Lecture Notes in Computer Science, vol 12809 Chmielewski Ł, Weissbart L, On Reverse Engineering Neural Network Implementation on GPU. In: Applied Cryptography and Network Security Workshops. ACNS 2021. Lecture Notes in Computer Science, vol 12809
21.
Zurück zum Zitat Bacanin N, Bezdan T, Venkatachalam K, Al-Tujrman F (2021) Optimized convolutional neural network by firefly algorithm for magnetic resonance image classification of glioma brain tumor grade. J Real-Time Image Proc 18:1085–1098CrossRef Bacanin N, Bezdan T, Venkatachalam K, Al-Tujrman F (2021) Optimized convolutional neural network by firefly algorithm for magnetic resonance image classification of glioma brain tumor grade. J Real-Time Image Proc 18:1085–1098CrossRef
23.
Zurück zum Zitat Lloyd DB, Boyd C, Govindaraju N (2008) Fast computation of general Fourier Transforms on GPUs. In: IEEE International Conference on Multimedia and Expo, pp 5–8 Lloyd DB, Boyd C, Govindaraju N (2008) Fast computation of general Fourier Transforms on GPUs. In: IEEE International Conference on Multimedia and Expo, pp 5–8
24.
Zurück zum Zitat Malkovsky SI, Sorokin AA, Tsoy GI et al (2021) Evaluating the performance of FFT library implementations on modern hybrid computing systems. J Supercomput 77(8):8326–8354CrossRef Malkovsky SI, Sorokin AA, Tsoy GI et al (2021) Evaluating the performance of FFT library implementations on modern hybrid computing systems. J Supercomput 77(8):8326–8354CrossRef
25.
Zurück zum Zitat Lee J, Kang H, Yeom H-J, Cheon S, Park J, Kim D (2021) Out-of-core GPU 2D-shift-FFT algorithm for ultra-high-resolution hologram generation. Optical Express 29:19094–19112CrossRef Lee J, Kang H, Yeom H-J, Cheon S, Park J, Kim D (2021) Out-of-core GPU 2D-shift-FFT algorithm for ultra-high-resolution hologram generation. Optical Express 29:19094–19112CrossRef
26.
Zurück zum Zitat Sorna A, Cheng X, D’Azevedo E, Won K, Tomov S (2018) Optimizing the fast Fourier transform using mixed precision on tensor core hardware. In: 25th IEEE International Conference on High Performance Computing Workshops (HiPCW), pp 3–7 Sorna A, Cheng X, D’Azevedo E, Won K, Tomov S (2018) Optimizing the fast Fourier transform using mixed precision on tensor core hardware. In: 25th IEEE International Conference on High Performance Computing Workshops (HiPCW), pp 3–7
27.
Zurück zum Zitat Stanković RS (1992) Some remarks on Fourier transforms and differential operators for digital functions. In: 22nd International Symposium on Multiple-valued Logic, Sendai, Japan, IEEE Press N.Y., pp 365–370 Stanković RS (1992) Some remarks on Fourier transforms and differential operators for digital functions. In: 22nd International Symposium on Multiple-valued Logic, Sendai, Japan, IEEE Press N.Y., pp 365–370
28.
Zurück zum Zitat Stanković RS (2017) The Reed-Muller-Fourier transform: computing methods and factorizations. In: Seising R, Allende-Cid H (eds) Claudio Moraga: a passion for multi-valued logic and soft computing. vol 349, Springer, pp 121–151CrossRef Stanković RS (2017) The Reed-Muller-Fourier transform: computing methods and factorizations. In: Seising R, Allende-Cid H (eds) Claudio Moraga: a passion for multi-valued logic and soft computing. vol 349, Springer, pp 121–151CrossRef
30.
Zurück zum Zitat NVIDIA Corporation (2017) NVIDIA Tesla V100 GPU Architecture v1.1 NVIDIA Corporation (2017) NVIDIA Tesla V100 GPU Architecture v1.1
31.
Zurück zum Zitat NVIDIA Corporation (2018) NVIDIA Turing GPU Architecture v01 NVIDIA Corporation (2018) NVIDIA Turing GPU Architecture v01
32.
Zurück zum Zitat NVIDIA Corporation (2020) NVIDIA A100 Tensor Core GPU Architecture v1.0 NVIDIA Corporation (2020) NVIDIA A100 Tensor Core GPU Architecture v1.0
33.
Zurück zum Zitat NVIDIA Corporation (2020) CUDA C++ Programming Guide v11.0 NVIDIA Corporation (2020) CUDA C++ Programming Guide v11.0
34.
Zurück zum Zitat Raihan MA, Goli N, Aamodt TM (2019) Modeling deep learning accelerator enabled GPUs. In: 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Madison, WI, USA, pp 79–92 Raihan MA, Goli N, Aamodt TM (2019) Modeling deep learning accelerator enabled GPUs. In: 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Madison, WI, USA, pp 79–92
35.
Zurück zum Zitat Yan D, Wang W, Chu X (2020) Demystifying tensor cores to optimize half-precision matrix multiply. In: 34th IEEE International Parallel and Distributed Processing Symposium (IPDPS), New Orleans, Louisiana, USA, pp 634–643 Yan D, Wang W, Chu X (2020) Demystifying tensor cores to optimize half-precision matrix multiply. In: 34th IEEE International Parallel and Distributed Processing Symposium (IPDPS), New Orleans, Louisiana, USA, pp 634–643
36.
Zurück zum Zitat Markidis S, Chien SWD, Laure E, Peng IB, Vetter JS (2018) NVIDIA tensor core programmability, performance & precision. In: 32nd IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Vancouver, British Columbia, Canada, pp 522–531 Markidis S, Chien SWD, Laure E, Peng IB, Vetter JS (2018) NVIDIA tensor core programmability, performance & precision. In: 32nd IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Vancouver, British Columbia, Canada, pp 522–531
37.
Zurück zum Zitat Loan CV (1992) Computational frameworks for the fast fourier transform. SIAM Loan CV (1992) Computational frameworks for the fast fourier transform. SIAM
38.
Metadaten
Titel
4-Valued spectral transforms implementation on GPU with Tensor Cores
verfasst von
Ivica Marković
Suzana Stojković
Publikationsdatum
02.07.2022
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 1/2023
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-022-04651-9

Weitere Artikel der Ausgabe 1/2023

The Journal of Supercomputing 1/2023 Zur Ausgabe