Skip to main content
Erschienen in: The Journal of Supercomputing 2/2017

10.06.2016

Optimal dynamic data layouts for 2D FFT on 3D memory integrated FPGA

verfasst von: Ren Chen, Shreyas G. Singapura, Viktor K. Prasanna

Erschienen in: The Journal of Supercomputing | Ausgabe 2/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

FPGAs have been widely used for accelerating various applications. For many data intensive applications, the memory bandwidth limits the performance. 3D memories with through-silicon-via connections provide potential solutions to the latency and bandwidth limitations. In this paper, we revisit the classic 2D FFT problem to evaluate the performance of 3D memory integrated FPGA. To fully utilize the fine-grained parallelism in 3D memory, data layouts which take into account the structure and organization of the memory are required. We propose dynamic data layouts for optimizing the performance of the 3D architecture. In 2D FFT, data are accessed in row major order in the first phase, whereas the data are accessed in column major order in the second phase. This column major order results in high memory latency and low bandwidth due to high row activation overhead of memory. Using the proposed dynamic data layouts, we improve memory access performance in the second phase without degrading the performance of the first phase. With parallelism employed in the third dimension of the memory, data parallelism can be increased to further improve the performance. We adopt a model-based approach for 3D memory and we perform experiments on the FPGA to validate our analysis and evaluate the performance. Compared with the baseline architecture, our approach achieves up to \(40\times \) peak memory bandwidth utilization for columnwise FFT, thus resulting in approximately \(97\,\,\%\) improvement in throughput for the complete 2D FFT application.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Chen R, Prasanna VK (2015) Energy and memory efficient bitonic sorting on FPGA. In: Proc. of ACM/SIGDA FPGA, pp 45–54 Chen R, Prasanna VK (2015) Energy and memory efficient bitonic sorting on FPGA. In: Proc. of ACM/SIGDA FPGA, pp 45–54
2.
Zurück zum Zitat Chen R, Prasanna VK (2015) Automatic generation of high throughput energy efficient streaming architectures for arbitrary fixed permutations. In: Proc. of IEEE Conference on Field Programmable Logic and Applications (FPL), pp 1–8. IEEE Chen R, Prasanna VK (2015) Automatic generation of high throughput energy efficient streaming architectures for arbitrary fixed permutations. In: Proc. of IEEE Conference on Field Programmable Logic and Applications (FPL), pp 1–8. IEEE
3.
Zurück zum Zitat Akin B, Milder PA, Franchetti F, Hoe JC (2012) Memory bandwidth efficient two-dimensional fast fourier transform algorithm and implementation for large problem sizes. In: Proc. of IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM ’12), pp 188–191 Akin B, Milder PA, Franchetti F, Hoe JC (2012) Memory bandwidth efficient two-dimensional fast fourier transform algorithm and implementation for large problem sizes. In: Proc. of IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM ’12), pp 188–191
4.
Zurück zum Zitat Chen R, Prasanna VK (2013) Energy efficient parameterized FFT architecture. In: Proc. of IEEE International Conference on FPL Chen R, Prasanna VK (2013) Energy efficient parameterized FFT architecture. In: Proc. of IEEE International Conference on FPL
5.
Zurück zum Zitat Kim JS, Yu C-L, Deng L, Kestur S, Narayanan V, Chakrabarti C (2009) FPGA Architecture for 2D Discrete Fourier Transform based on 2D decomposition for large-sized data. In: Proc. of IEEE Workshop on Signal Processing Systems, pp 121–126 Kim JS, Yu C-L, Deng L, Kestur S, Narayanan V, Chakrabarti C (2009) FPGA Architecture for 2D Discrete Fourier Transform based on 2D decomposition for large-sized data. In: Proc. of IEEE Workshop on Signal Processing Systems, pp 121–126
7.
Zurück zum Zitat Park Neungsoo, Prasanna Viktor K (2004) Dynamic data layouts for cache-conscious implementation of a class of signal transforms. IEEE Trans Signal Process 52(7):2120–2134CrossRef Park Neungsoo, Prasanna Viktor K (2004) Dynamic data layouts for cache-conscious implementation of a class of signal transforms. IEEE Trans Signal Process 52(7):2120–2134CrossRef
8.
Zurück zum Zitat Wang W, Duan B, Zhang C, Zhang P, Sun N (2010) Accelerating 2D FT with non-power-of-two problem size on FPGA. In: Proc. of IEEE International Conference on Reconfigurable Computing and FPGAs (ReConFig ’10), pp 208–213 Wang W, Duan B, Zhang C, Zhang P, Sun N (2010) Accelerating 2D FT with non-power-of-two problem size on FPGA. In: Proc. of IEEE International Conference on Reconfigurable Computing and FPGAs (ReConFig ’10), pp 208–213
9.
Zurück zum Zitat Akin B, Franchetti F, Hoe JC (2014) Understanding the Design Space of Dram-optimized Hardware FFT Accelerators. In: Application-specific Systems, Architectures and Processors (ASAP), 2014 IEEE 25th International Conference on, pp 248–255. IEEE Akin B, Franchetti F, Hoe JC (2014) Understanding the Design Space of Dram-optimized Hardware FFT Accelerators. In: Application-specific Systems, Architectures and Processors (ASAP), 2014 IEEE 25th International Conference on, pp 248–255. IEEE
10.
Zurück zum Zitat Wu Hong Ren, Paoloni Frank John (1989) The structure of vector radix fast Fourier transforms. IEEE Trans Acoust Speech Signal Process 37(9):1415–1424CrossRefMATH Wu Hong Ren, Paoloni Frank John (1989) The structure of vector radix fast Fourier transforms. IEEE Trans Acoust Speech Signal Process 37(9):1415–1424CrossRefMATH
11.
Zurück zum Zitat Zhu Q, Akin B, Sumbul HE, Sadi F, Hoe JC, Pileggi L, Franchetti F (2013) A 3D-stacked logic-in-memory accelerator for application-specific data intensive computing. In: Proc. of IEEE International Conference on 3D Systems Integration Conference (3DIC), pp 1–7. IEEE Zhu Q, Akin B, Sumbul HE, Sadi F, Hoe JC, Pileggi L, Franchetti F (2013) A 3D-stacked logic-in-memory accelerator for application-specific data intensive computing. In: Proc. of IEEE International Conference on 3D Systems Integration Conference (3DIC), pp 1–7. IEEE
12.
Zurück zum Zitat Gadfort P, Dasu A, Akoglu A, Leow YK, Fritze M (2014) A power efficient reconfigurable system-in-stack: 3D integration of accelerators, FPGAs, and DRAM. In: Proc. of IEEE International Conference on System-on-Chip Conference (SOCC), pp 11–16. IEEE Gadfort P, Dasu A, Akoglu A, Leow YK, Fritze M (2014) A power efficient reconfigurable system-in-stack: 3D integration of accelerators, FPGAs, and DRAM. In: Proc. of IEEE International Conference on System-on-Chip Conference (SOCC), pp 11–16. IEEE
13.
Zurück zum Zitat Singapura SG, Panangadan A, Prasanna VK (2015) Towards performance modeling of 3D memory integrated FPGA architectures. In: Proc. of International Conference on Applied Reconfigurable Computing Singapura SG, Panangadan A, Prasanna VK (2015) Towards performance modeling of 3D memory integrated FPGA architectures. In: Proc. of International Conference on Applied Reconfigurable Computing
14.
Zurück zum Zitat Singapura SG, Panangadan A, Prasanna VK (2015) Performance modeling of matrix multiplication on 3D memory integrated FPGA. In: Proc. of 22nd Reconfigurable Architectures Workshop, IPDPDS Singapura SG, Panangadan A, Prasanna VK (2015) Performance modeling of matrix multiplication on 3D memory integrated FPGA. In: Proc. of 22nd Reconfigurable Architectures Workshop, IPDPDS
15.
Zurück zum Zitat Chen R, Prasanna VK (2013) Energy-efficient architecture for stride permutation on streaming data. In: Proc. of IEEE Conference on ReConFig, pp 1–7 Chen R, Prasanna VK (2013) Energy-efficient architecture for stride permutation on streaming data. In: Proc. of IEEE Conference on ReConFig, pp 1–7
16.
Zurück zum Zitat Chen R, Park N, Prasanna VK (2013) High throughput energy efficient parallel FFT architecture on FPGAs. In: Proc. of IEEE High Performance Extreme Computing Conference (HPEC), pp 1–6. IEEE Chen R, Park N, Prasanna VK (2013) High throughput energy efficient parallel FFT architecture on FPGAs. In: Proc. of IEEE High Performance Extreme Computing Conference (HPEC), pp 1–6. IEEE
18.
Zurück zum Zitat Chen R, Prasanna VK (2015) DRAM Row Activation Energy Optimization for Stride Memory Access on FPGA-based Systems. In: Proc. of International Conference on Applied Reconfigurable Computing Chen R, Prasanna VK (2015) DRAM Row Activation Energy Optimization for Stride Memory Access on FPGA-based Systems. In: Proc. of International Conference on Applied Reconfigurable Computing
Metadaten
Titel
Optimal dynamic data layouts for 2D FFT on 3D memory integrated FPGA
verfasst von
Ren Chen
Shreyas G. Singapura
Viktor K. Prasanna
Publikationsdatum
10.06.2016
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 2/2017
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-016-1772-1

Weitere Artikel der Ausgabe 2/2017

The Journal of Supercomputing 2/2017 Zur Ausgabe