Skip to main content

2017 | OriginalPaper | Buchkapitel

10. An Approximation Workflow for Exploiting Data-Level Parallelism in FPGA Acceleration

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Modern applications including graphics, multimedia, web search, and data analytics not only can benefit from acceleration, but also exhibit significant degrees of tolerance to imprecise computation. This amenability to approximation provides an opportunity to trade quality of the results for higher performance and better resource utilization. Exploiting this opportunity is particularly important for FPGA accelerators that are inherently subject to many resource constraints. To better utilize the FPGA resources, we devise, Grater, an automated design workflow for FPGA accelerators that leverages imprecise computation to increase data-level parallelism and achieve higher computational throughput. The core of our workflow is a source-to-source compiler that takes in an input kernel and applies a novel optimization technique that selectively reduces the precision of kernel’s data and operations. By selectively reducing the precision of the data and operation, the required area to synthesize the kernels on the FPGA decreases allowing to integrate a larger number of operations and parallel kernels in the fixed area of the FPGA. The larger number of integrated kernels provides more hardware context to better exploit data-level parallelism in the target applications. To effectively explore the possible design space of approximate kernels, we exploit a genetic algorithm to find a subset of safe-to-approximate operations and data elements and then tune their precision levels until the desired output quality is achieved. Grater exploits a fully software technique and does not require any changes to the underlying FPGA hardware. We evaluate Grater on a diverse set of data-intensive OpenCL benchmarks from the AMD SDK. The synthesis result on a modern Altera FPGA shows that our approximation workflow yields 1.4\(\times \)–3.0\(\times \) higher throughput with less than 1% quality loss.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Replication is handled in Altera OpenCL by setting num_compute_units as a kernel attribute.
 
2
We limit the space of our optimization search across the available variable types in OpenCL, as opposed to within a type itself [10], due to the nature of a source-to-source transformer that requires to work at the same level of abstraction of the input programming language. Grater enables Altera OpenCL synthesis tool chain to benefit from this source-to-source translation by generating standard OpenCL approximate kernels.
 
3
Grater also enables the programmer to annotate critical variables as non-approximable, so that the transcompiler would not change their precision.
 
4
It should be noted that the accelerated profiling process on GPU takes order of milliseconds to determine if the kernel can meet the quality-of-result target. While it takes on average more than an hour to synthesize the approximate OpenCL kernels on Stratix V FPGA.
 
Literatur
1.
Zurück zum Zitat A. Yazdanbakhsh, J. Park, H. Sharma, P. Lotfi-Kamran, H. Esmaeilzadeh, Neural acceleration for GPU throughput processors, in Proceedings of the 48th International Symposium on Microarchitecture, MICRO-48 (ACM, New York, NY, USA, 2015), pp. 482–493 A. Yazdanbakhsh, J. Park, H. Sharma, P. Lotfi-Kamran, H. Esmaeilzadeh, Neural acceleration for GPU throughput processors, in Proceedings of the 48th International Symposium on Microarchitecture, MICRO-48 (ACM, New York, NY, USA, 2015), pp. 482–493
2.
Zurück zum Zitat A. Yazdanbakhsh, D. Mahajan, B. Thwaites, J. Park, A. Nagendrakumar, S. Sethuraman, K. Ramkrishnan, N. Ravindran, R. Jariwala, A. Rahimi, H. Esmaeilzadeh, K. Bazargan, Axilog: language support for approximate hardware design, in 2015 Design, Automation Test in Europe Conference Exhibition (DATE) (2015), pp. 812–817 A. Yazdanbakhsh, D. Mahajan, B. Thwaites, J. Park, A. Nagendrakumar, S. Sethuraman, K. Ramkrishnan, N. Ravindran, R. Jariwala, A. Rahimi, H. Esmaeilzadeh, K. Bazargan, Axilog: language support for approximate hardware design, in 2015 Design, Automation Test in Europe Conference Exhibition (DATE) (2015), pp. 812–817
3.
Zurück zum Zitat T. Moreau, M. Wyse, J. Nelson, A. Sampson, H. Esmaeilzadeh, L. Ceze, M. Oskin, SNNAP: approximate computing on programmable SoCs via neural acceleration, in 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA) (2015), pp. 603–614 T. Moreau, M. Wyse, J. Nelson, A. Sampson, H. Esmaeilzadeh, L. Ceze, M. Oskin, SNNAP: approximate computing on programmable SoCs via neural acceleration, in 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA) (2015), pp. 603–614
4.
Zurück zum Zitat A.B. Kahng, S. Kang, Accuracy-configurable adder for approximate arithmetic designs, in 2012 49th ACM/EDAC/IEEE Design Automation Conference (DAC) (2012), pp. 820–825 A.B. Kahng, S. Kang, Accuracy-configurable adder for approximate arithmetic designs, in 2012 49th ACM/EDAC/IEEE Design Automation Conference (DAC) (2012), pp. 820–825
5.
Zurück zum Zitat P. Kulkarni, P. Gupta, M. Ercegovac, Trading accuracy for power with an underdesigned multiplier architecture, in 2011 24th International Conference on VLSI Design (VLSI Design) (2011), pp. 346–351 P. Kulkarni, P. Gupta, M. Ercegovac, Trading accuracy for power with an underdesigned multiplier architecture, in 2011 24th International Conference on VLSI Design (VLSI Design) (2011), pp. 346–351
9.
Zurück zum Zitat D. Chen, D. Singh, Invited paper: using OpenCL to evaluate the efficiency of CPUS, GPUS and FPGAS for information filtering, in 2012 22nd International Conference on Field Programmable Logic and Applications (FPL) (2012), pp. 5–12 D. Chen, D. Singh, Invited paper: using OpenCL to evaluate the efficiency of CPUS, GPUS and FPGAS for information filtering, in 2012 22nd International Conference on Field Programmable Logic and Applications (FPL) (2012), pp. 5–12
10.
Zurück zum Zitat E. Schkufza, R. Sharma, A. Aiken, Stochastic optimization of floating-point programs with tunable precision, in Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI’14 (ACM, New York, NY, USA, 2014), pp. 53–64 E. Schkufza, R. Sharma, A. Aiken, Stochastic optimization of floating-point programs with tunable precision, in Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI’14 (ACM, New York, NY, USA, 2014), pp. 53–64
11.
Zurück zum Zitat A.E. Eiben, J.E. Smith, Introduction to Evolutionary Computing, 2nd edn., Natural Computing Series (Springer, Heidelberg, 2007) A.E. Eiben, J.E. Smith, Introduction to Evolutionary Computing, 2nd edn., Natural Computing Series (Springer, Heidelberg, 2007)
13.
Zurück zum Zitat S. Misailovic, M. Carbin, S. Achour, Z. Qi, M.C. Rinard, Chisel: reliability- and accuracy-aware optimization of approximate computational kernels, in Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications, OOPSLA’14 (ACM, New York, NY, USA, 2014), pp. 309–328 S. Misailovic, M. Carbin, S. Achour, Z. Qi, M.C. Rinard, Chisel: reliability- and accuracy-aware optimization of approximate computational kernels, in Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications, OOPSLA’14 (ACM, New York, NY, USA, 2014), pp. 309–328
14.
Zurück zum Zitat P. Roy, R. Ray, C. Wang, W.F. Wong, ASAC: automatic sensitivity analysis for approximate computing, in Proceedings of the 2014 SIGPLAN/SIGBED Conference on Languages, Compilers and Tools for Embedded Systems, LCTES’14 (ACM, New York, NY, USA, 2014), pp. 95–104 P. Roy, R. Ray, C. Wang, W.F. Wong, ASAC: automatic sensitivity analysis for approximate computing, in Proceedings of the 2014 SIGPLAN/SIGBED Conference on Languages, Compilers and Tools for Embedded Systems, LCTES’14 (ACM, New York, NY, USA, 2014), pp. 95–104
Metadaten
Titel
An Approximation Workflow for Exploiting Data-Level Parallelism in FPGA Acceleration
verfasst von
Abbas Rahimi
Luca Benini
Rajesh K. Gupta
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-53768-9_10

Neuer Inhalt