nach oben

The Journal of Supercomputing

Erschienen in:

01.02.2013

Accelerating thread-intensive and explicit memory management programs with dynamic partial reconfiguration

verfasst von: Qianming Yang, Mei Wen, Nan Wu, Chunyuan Zhang

Erschienen in: The Journal of Supercomputing | Ausgabe 2/2013

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Recent research has shown that field programmable gate arrays (FPGAs) have a large potential for accelerating demanding applications, such as high performance digital signal process applications with low-volume market. The loss of generality in the architecture is one disadvantage of using FPGAs, however, the reconfigurability of FPGAs allow reprogramming for other applications. Therefore, a uniform FPGA-based architecture, an efficient programming model, and a simple mapping method are paramount for the wide acceptance of FPGA technology. This paper presents MASALA, a dynamically reconfigurable FPGA-based accelerator for parallel programs written in thread-intensive and explicit memory management (TEMM) programming models. Our system uses a TEMM programming model to parallelize demanding applications, including application decomposition into separate thread blocks and compute and data load/store decoupling. Hardware engines are included into MASALA using partial dynamic reconfiguration modules, each of which encapsulates a thread process engine that implements the hardware’s thread functionality. A data dispatching scheme is also included in MASALA to enable the explicit communication of multiple memory hierarchies such as interhardware engines, host processors, and hardware engines. Finally, this paper illustrates a multi-FPGA prototype system of the presented architecture: MASALA-SX. A large synthetic aperture radar image formatting experiment shows that MASALA’s architecture facilitates the construction of a TEMM program accelerator by providing greater performance and less power consumption than current CPU platforms, without sacrificing programmability, flexibility, and scalability.

Vorheriger Artikel Forecasting large scale conditional volatility and covariance using neural network on GPU

Nächster Artikel Privacy-aware searching with oblivious term matching for cloud storage

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Fatahalian J, Knight TJ et al (2006) Sequoia: programming the memory hierarchy. In: Proceedings of the 2006 ACM/IEEE conference on supercomputing

Mattson P (2002) A programming system for the imagine media processor. Dissertation, Stanford University

NVIDIA Corporation (2010) CUDA programming guide, version 2.1

Buck I, Foley T et al (2004) Brook for GPUs: stream computing on graphics hardware. ACM Trans Graph 23(3):777–786 CrossRef

Sukhwani B et al (2009) Effective floating point applications on FPGAs: examples from molecular modeling. In: High performance embedded computing workshop

Xilinx Inc (2008) Early access partial reconfiguration user guide (UG208 v1.2). http://www.xilinx.com

Alpern B, Carter L, Ferrante J (1993) Modeling parallel computers as memory hierarchies. In: Proceedings of the programming models for massively parallel computers

Rixner S, Dally WJ et al (1998) A bandwidth-efficient architecture for media processing. In: Proceedings of 31st annual ACM/IEEE international symposium on microarchitecture

Bikshandi G, Guo et al (2006) Programming for parallelism and locality with hierarchically tiled arrays. In: Proceedings of the eleventh ACM SIGPLAN symposium on principles and practice of parallel programming

10.

Charles P, Grothoff C et al (2005) X10: an object-oriented approach to nonuniform cluster computing. In: OOPSLA’05: proceedings of the 20th annual ACM SIGPLAN conference on object oriented programming systems languages and applications

11.

Callahan D, Chamberlain BL, Zima HP (2004) The Cascade high productivity language. In: Ninth international workshop on high-level parallel programming models and supportive environments

12.

Krasteva Y, Jimeno A, Torre E, Riesgo T (2005) Straight method for reallocation of complex cores by dynamic reconfiguration in Virtex II FPGAs. In: Proceedings of the 16th IEEE international workshop on rapid system prototyping, Montreal, Canada

13.

Xilinx Inc (2007) XPS HWICAP (v1.00.a) product specification (DS586). http://www.xilinx.com

14.

Liu M, Kuehn W, Lu Z, Jantsch A (2009) Run-time partial reconfiguration speed investigation and architectural design space exploration. In: Proceedings of IEEE international conference on field programmable logic and applications

15.

Mcdonald EJ (2008) Runtime FPGA partial reconfiguration. In: Proceedings of 2008 IEEE aerospace conference. March 2008

16.

Claus C, Zhang B, Stechele W et al (2008) A multiplatform controller allowing for maximum dynamic partial reconfiguration throughput. In: Proceedings of the international conference on field programmable logic and applications. September 2008

17.

Pi Y, Long H, Huang S (2002) A SAR parallel processing algorithm and its implementation. In: FIEOS conference

18.

Chan YK, Koo VC (2008) Modified algorithm for real time SAR signal processing. Prog Electromagn Res C 1:159–168 CrossRef

19.

Kuusilinna K et al (2003) Designing BEE: a hardware emulation engine for signal processing in low-power wireless applications. EURASIP J Appl Signal Process

20.

Heithecker S et al (2007) A high-end real-time digital film processing reconfigurable platform. EURASIP J Embed Syst

21.

Chang C (2005) Design and applications of a reconfigurable computing system for high performance digital signal processing. Dissertation, University of California, Berkeley

22.

Manuel S, Daniel N, Emanuel R, Paul C (2006) Configuration and programming of heterogeneous multiprocessors on a multi-FPGA system using TMD-MPI. IEEE, New York

23.

Lysaght P, Blodget B, Mason J, Young J, Bridgford B (2006) Enhanced architectures, design methodologies and CAD tools for dynamic reconfiguration of Xilinx FPGAs. In: International conference on field programmable logic and applications

24.

Sedcole P, Blodget B, Becker T, Anderson J, Lysaght P (2006) Modular dynamic reconfiguration in Virtex FPGAs. In: IEE proceedings on computers and digital techniques

25.

Jian H, Matthew P, Jooheung L, Ronald FD (2008) Scalable FPGA-based architecture for DCT computation using dynamic partial reconfiguration. ACM Trans Embed Comput Syst 1–18

26.

Claus C, Zeppenfeld J, MÄuller F, Stechele W (2007) Using partial-run-time reconfigurable hardware to accelerate video processing in driver assistance system. In: Proceedings of the conference on design, automation and test in Europe, San Jose, CA, USA

27.

Mateusz M, Jürgen T, Ali A, Christophe B (2007) The Erlangen slot machine: a dynamically reconfigurable FPGA-based computer. J VLSI Signal Process 47(1)

28.

Chi-Keung L, Sunpyo H, Hyesoon K (2009) Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In: Proceedings of the 42nd annual IEEE/ACM international symposium on microarchitecture

29.

Xiao L et al (2008) Implementation for high resolution SAR parallel imaging. Inf Electron Eng 6(1)

30.

Carlston P et al (2009) Optimizing an innovative SAR post-processing algorithm for multi-core processors: a case study. In: High performance embedded computing workshop

31.

Lundgren W et al (2007) Programming examples that expose efficiency issues for the cell broadband engine architecture. In: High performance embedded computing workshop

32.

John LH, David AP (2002) Computer architecture: a quantitative approach, 3rd edn. Morgan Kaufmann, San Mateo MATH

33.

http://sequoia.stanford.edu/, 2010

34.

http://scottmcpeak.com/elkhound/sources/elsa/, 2010

35.

FFT Xilinx Logicore (2010) http://www.xilinx.com/products/ipcenter/FFT.htm

Titel: Accelerating thread-intensive and explicit memory management programs with dynamic partial reconfiguration
verfasst von: Qianming Yang
Mei Wen
Nan Wu
Chunyuan Zhang
Publikationsdatum: 01.02.2013
Verlag: Springer US
Erschienen in: The Journal of Supercomputing / Ausgabe 2/2013
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI: https://doi.org/10.1007/s11227-012-0828-0

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Weitere Artikel der Ausgabe 2/2013

Privacy-aware searching with oblivious term matching for cloud storage

Multi-domain job coscheduling for leadership computing systems

Distributed scheduling algorithms for channel access in TDMA wireless mesh networks

Deadline and energy constrained dynamic resource allocation in a heterogeneous computing environment

A mobile agent-based routing model for grid computing

Guest Editors’ introduction

Premium Partner