Skip to main content
Top
Published in: The Journal of Supercomputing 2/2013

01-02-2013

Accelerating thread-intensive and explicit memory management programs with dynamic partial reconfiguration

Authors: Qianming Yang, Mei Wen, Nan Wu, Chunyuan Zhang

Published in: The Journal of Supercomputing | Issue 2/2013

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Recent research has shown that field programmable gate arrays (FPGAs) have a large potential for accelerating demanding applications, such as high performance digital signal process applications with low-volume market. The loss of generality in the architecture is one disadvantage of using FPGAs, however, the reconfigurability of FPGAs allow reprogramming for other applications. Therefore, a uniform FPGA-based architecture, an efficient programming model, and a simple mapping method are paramount for the wide acceptance of FPGA technology. This paper presents MASALA, a dynamically reconfigurable FPGA-based accelerator for parallel programs written in thread-intensive and explicit memory management (TEMM) programming models. Our system uses a TEMM programming model to parallelize demanding applications, including application decomposition into separate thread blocks and compute and data load/store decoupling. Hardware engines are included into MASALA using partial dynamic reconfiguration modules, each of which encapsulates a thread process engine that implements the hardware’s thread functionality. A data dispatching scheme is also included in MASALA to enable the explicit communication of multiple memory hierarchies such as interhardware engines, host processors, and hardware engines. Finally, this paper illustrates a multi-FPGA prototype system of the presented architecture: MASALA-SX. A large synthetic aperture radar image formatting experiment shows that MASALA’s architecture facilitates the construction of a TEMM program accelerator by providing greater performance and less power consumption than current CPU platforms, without sacrificing programmability, flexibility, and scalability.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Fatahalian J, Knight TJ et al (2006) Sequoia: programming the memory hierarchy. In: Proceedings of the 2006 ACM/IEEE conference on supercomputing Fatahalian J, Knight TJ et al (2006) Sequoia: programming the memory hierarchy. In: Proceedings of the 2006 ACM/IEEE conference on supercomputing
2.
go back to reference Mattson P (2002) A programming system for the imagine media processor. Dissertation, Stanford University Mattson P (2002) A programming system for the imagine media processor. Dissertation, Stanford University
3.
go back to reference NVIDIA Corporation (2010) CUDA programming guide, version 2.1 NVIDIA Corporation (2010) CUDA programming guide, version 2.1
4.
go back to reference Buck I, Foley T et al (2004) Brook for GPUs: stream computing on graphics hardware. ACM Trans Graph 23(3):777–786 CrossRef Buck I, Foley T et al (2004) Brook for GPUs: stream computing on graphics hardware. ACM Trans Graph 23(3):777–786 CrossRef
5.
go back to reference Sukhwani B et al (2009) Effective floating point applications on FPGAs: examples from molecular modeling. In: High performance embedded computing workshop Sukhwani B et al (2009) Effective floating point applications on FPGAs: examples from molecular modeling. In: High performance embedded computing workshop
7.
go back to reference Alpern B, Carter L, Ferrante J (1993) Modeling parallel computers as memory hierarchies. In: Proceedings of the programming models for massively parallel computers Alpern B, Carter L, Ferrante J (1993) Modeling parallel computers as memory hierarchies. In: Proceedings of the programming models for massively parallel computers
8.
go back to reference Rixner S, Dally WJ et al (1998) A bandwidth-efficient architecture for media processing. In: Proceedings of 31st annual ACM/IEEE international symposium on microarchitecture Rixner S, Dally WJ et al (1998) A bandwidth-efficient architecture for media processing. In: Proceedings of 31st annual ACM/IEEE international symposium on microarchitecture
9.
go back to reference Bikshandi G, Guo et al (2006) Programming for parallelism and locality with hierarchically tiled arrays. In: Proceedings of the eleventh ACM SIGPLAN symposium on principles and practice of parallel programming Bikshandi G, Guo et al (2006) Programming for parallelism and locality with hierarchically tiled arrays. In: Proceedings of the eleventh ACM SIGPLAN symposium on principles and practice of parallel programming
10.
go back to reference Charles P, Grothoff C et al (2005) X10: an object-oriented approach to nonuniform cluster computing. In: OOPSLA’05: proceedings of the 20th annual ACM SIGPLAN conference on object oriented programming systems languages and applications Charles P, Grothoff C et al (2005) X10: an object-oriented approach to nonuniform cluster computing. In: OOPSLA’05: proceedings of the 20th annual ACM SIGPLAN conference on object oriented programming systems languages and applications
11.
go back to reference Callahan D, Chamberlain BL, Zima HP (2004) The Cascade high productivity language. In: Ninth international workshop on high-level parallel programming models and supportive environments Callahan D, Chamberlain BL, Zima HP (2004) The Cascade high productivity language. In: Ninth international workshop on high-level parallel programming models and supportive environments
12.
go back to reference Krasteva Y, Jimeno A, Torre E, Riesgo T (2005) Straight method for reallocation of complex cores by dynamic reconfiguration in Virtex II FPGAs. In: Proceedings of the 16th IEEE international workshop on rapid system prototyping, Montreal, Canada Krasteva Y, Jimeno A, Torre E, Riesgo T (2005) Straight method for reallocation of complex cores by dynamic reconfiguration in Virtex II FPGAs. In: Proceedings of the 16th IEEE international workshop on rapid system prototyping, Montreal, Canada
14.
go back to reference Liu M, Kuehn W, Lu Z, Jantsch A (2009) Run-time partial reconfiguration speed investigation and architectural design space exploration. In: Proceedings of IEEE international conference on field programmable logic and applications Liu M, Kuehn W, Lu Z, Jantsch A (2009) Run-time partial reconfiguration speed investigation and architectural design space exploration. In: Proceedings of IEEE international conference on field programmable logic and applications
15.
go back to reference Mcdonald EJ (2008) Runtime FPGA partial reconfiguration. In: Proceedings of 2008 IEEE aerospace conference. March 2008 Mcdonald EJ (2008) Runtime FPGA partial reconfiguration. In: Proceedings of 2008 IEEE aerospace conference. March 2008
16.
go back to reference Claus C, Zhang B, Stechele W et al (2008) A multiplatform controller allowing for maximum dynamic partial reconfiguration throughput. In: Proceedings of the international conference on field programmable logic and applications. September 2008 Claus C, Zhang B, Stechele W et al (2008) A multiplatform controller allowing for maximum dynamic partial reconfiguration throughput. In: Proceedings of the international conference on field programmable logic and applications. September 2008
17.
go back to reference Pi Y, Long H, Huang S (2002) A SAR parallel processing algorithm and its implementation. In: FIEOS conference Pi Y, Long H, Huang S (2002) A SAR parallel processing algorithm and its implementation. In: FIEOS conference
18.
go back to reference Chan YK, Koo VC (2008) Modified algorithm for real time SAR signal processing. Prog Electromagn Res C 1:159–168 CrossRef Chan YK, Koo VC (2008) Modified algorithm for real time SAR signal processing. Prog Electromagn Res C 1:159–168 CrossRef
19.
go back to reference Kuusilinna K et al (2003) Designing BEE: a hardware emulation engine for signal processing in low-power wireless applications. EURASIP J Appl Signal Process Kuusilinna K et al (2003) Designing BEE: a hardware emulation engine for signal processing in low-power wireless applications. EURASIP J Appl Signal Process
20.
go back to reference Heithecker S et al (2007) A high-end real-time digital film processing reconfigurable platform. EURASIP J Embed Syst Heithecker S et al (2007) A high-end real-time digital film processing reconfigurable platform. EURASIP J Embed Syst
21.
go back to reference Chang C (2005) Design and applications of a reconfigurable computing system for high performance digital signal processing. Dissertation, University of California, Berkeley Chang C (2005) Design and applications of a reconfigurable computing system for high performance digital signal processing. Dissertation, University of California, Berkeley
22.
go back to reference Manuel S, Daniel N, Emanuel R, Paul C (2006) Configuration and programming of heterogeneous multiprocessors on a multi-FPGA system using TMD-MPI. IEEE, New York Manuel S, Daniel N, Emanuel R, Paul C (2006) Configuration and programming of heterogeneous multiprocessors on a multi-FPGA system using TMD-MPI. IEEE, New York
23.
go back to reference Lysaght P, Blodget B, Mason J, Young J, Bridgford B (2006) Enhanced architectures, design methodologies and CAD tools for dynamic reconfiguration of Xilinx FPGAs. In: International conference on field programmable logic and applications Lysaght P, Blodget B, Mason J, Young J, Bridgford B (2006) Enhanced architectures, design methodologies and CAD tools for dynamic reconfiguration of Xilinx FPGAs. In: International conference on field programmable logic and applications
24.
go back to reference Sedcole P, Blodget B, Becker T, Anderson J, Lysaght P (2006) Modular dynamic reconfiguration in Virtex FPGAs. In: IEE proceedings on computers and digital techniques Sedcole P, Blodget B, Becker T, Anderson J, Lysaght P (2006) Modular dynamic reconfiguration in Virtex FPGAs. In: IEE proceedings on computers and digital techniques
25.
go back to reference Jian H, Matthew P, Jooheung L, Ronald FD (2008) Scalable FPGA-based architecture for DCT computation using dynamic partial reconfiguration. ACM Trans Embed Comput Syst 1–18 Jian H, Matthew P, Jooheung L, Ronald FD (2008) Scalable FPGA-based architecture for DCT computation using dynamic partial reconfiguration. ACM Trans Embed Comput Syst 1–18
26.
go back to reference Claus C, Zeppenfeld J, MÄuller F, Stechele W (2007) Using partial-run-time reconfigurable hardware to accelerate video processing in driver assistance system. In: Proceedings of the conference on design, automation and test in Europe, San Jose, CA, USA Claus C, Zeppenfeld J, MÄuller F, Stechele W (2007) Using partial-run-time reconfigurable hardware to accelerate video processing in driver assistance system. In: Proceedings of the conference on design, automation and test in Europe, San Jose, CA, USA
27.
go back to reference Mateusz M, Jürgen T, Ali A, Christophe B (2007) The Erlangen slot machine: a dynamically reconfigurable FPGA-based computer. J VLSI Signal Process 47(1) Mateusz M, Jürgen T, Ali A, Christophe B (2007) The Erlangen slot machine: a dynamically reconfigurable FPGA-based computer. J VLSI Signal Process 47(1)
28.
go back to reference Chi-Keung L, Sunpyo H, Hyesoon K (2009) Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In: Proceedings of the 42nd annual IEEE/ACM international symposium on microarchitecture Chi-Keung L, Sunpyo H, Hyesoon K (2009) Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In: Proceedings of the 42nd annual IEEE/ACM international symposium on microarchitecture
29.
go back to reference Xiao L et al (2008) Implementation for high resolution SAR parallel imaging. Inf Electron Eng 6(1) Xiao L et al (2008) Implementation for high resolution SAR parallel imaging. Inf Electron Eng 6(1)
30.
go back to reference Carlston P et al (2009) Optimizing an innovative SAR post-processing algorithm for multi-core processors: a case study. In: High performance embedded computing workshop Carlston P et al (2009) Optimizing an innovative SAR post-processing algorithm for multi-core processors: a case study. In: High performance embedded computing workshop
31.
go back to reference Lundgren W et al (2007) Programming examples that expose efficiency issues for the cell broadband engine architecture. In: High performance embedded computing workshop Lundgren W et al (2007) Programming examples that expose efficiency issues for the cell broadband engine architecture. In: High performance embedded computing workshop
32.
go back to reference John LH, David AP (2002) Computer architecture: a quantitative approach, 3rd edn. Morgan Kaufmann, San Mateo MATH John LH, David AP (2002) Computer architecture: a quantitative approach, 3rd edn. Morgan Kaufmann, San Mateo MATH
Metadata
Title
Accelerating thread-intensive and explicit memory management programs with dynamic partial reconfiguration
Authors
Qianming Yang
Mei Wen
Nan Wu
Chunyuan Zhang
Publication date
01-02-2013
Publisher
Springer US
Published in
The Journal of Supercomputing / Issue 2/2013
Print ISSN: 0920-8542
Electronic ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-012-0828-0

Other articles of this Issue 2/2013

The Journal of Supercomputing 2/2013 Go to the issue

Premium Partner