Skip to main content
Erschienen in: The Journal of Supercomputing 2/2013

01.05.2013

FPGA implementation of an exact dot product and its application in variable-precision floating-point arithmetic

verfasst von: Yuanwu Lei, Yong Dou, Yazhuo Dong, Jie Zhou, Fei Xia

Erschienen in: The Journal of Supercomputing | Ausgabe 2/2013

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The current paper explores the capability and flexibility of field programmable gate-arrays (FPGAs) to implement variable-precision floating-point (VP) arithmetic. First, the VP exact dot product algorithm, which uses exact fixed-point operations to obtain an exact result, is presented. A VP multiplication and accumulation unit (VPMAC) on FPGA is then proposed. In the proposed design, the parallel multipliers generate the partial products of mantissa multiplication in parallel, which is the most time-consuming part in the VP multiplication and accumulation operation. This method fully utilizes DSP performance on FPGAs to enhance the performance of the VPMAC unit. Several other schemes, such as two-level RAM bank, carry-save accumulation, and partial summation, are used to achieve high frequency and pipeline throughput in the product accumulation stage. The typical algorithms in Basic Linear Algorithm Subprograms (i.e., vector dot product, general matrix vector product, and general matrix multiply product), LU decomposition, and Modified Gram–Schmidt QR decomposition, are used to evaluate the performance of the VPMAC unit. Two schemes, called the VPMAC coprocessor and matrix accelerator, are presented to implement these applications. Finally, prototypes of the VPMAC unit and the matrix accelerator based on the VPMAC unit are created on a Xilinx XC6VLX760 FPGA chip.
Compared with a parallel software implementation based on OpenMP running on an Intel Xeon Quad-core E5620 CPU, the VPMAC coprocessor, equipped with one VPMAC unit, achieves a maximum acceleration factor of 18X. Moreover, the matrix accelerator, which mainly consists of a linear array of eight processing elements, achieves 12X–65X better performance.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Yun H, Chris D (2001) Using accurate arithmetics to improve numerical reproducibility and stability in parallel applications. J Supercomput 18(3):259–277 MATHCrossRef Yun H, Chris D (2001) Using accurate arithmetics to improve numerical reproducibility and stability in parallel applications. J Supercomput 18(3):259–277 MATHCrossRef
2.
Zurück zum Zitat Bailey DH (2005) High-precision floating-point arithmetic in scientific computation. Comput Sci Eng 7(3):54–61 CrossRef Bailey DH (2005) High-precision floating-point arithmetic in scientific computation. Comput Sci Eng 7(3):54–61 CrossRef
4.
Zurück zum Zitat Fousse L, Hanrot G, Lefevre V, Pelissier P, Zimmermann P (2007) MPFR: a multiple-precision binary floating-point library with correct rounding. Trans Math Softw 33(2):1–15 MathSciNetCrossRef Fousse L, Hanrot G, Lefevre V, Pelissier P, Zimmermann P (2007) MPFR: a multiple-precision binary floating-point library with correct rounding. Trans Math Softw 33(2):1–15 MathSciNetCrossRef
6.
Zurück zum Zitat Fujimoto J, Ishikawa T, Perret-Gallix D (2005) High precision numerical computations—a case for an happy design. ACPP IRG note, ACPP-N-1: KEK-CP-164 Fujimoto J, Ishikawa T, Perret-Gallix D (2005) High precision numerical computations—a case for an happy design. ACPP IRG note, ACPP-N-1: KEK-CP-164
7.
Zurück zum Zitat Cohen MS, Hull TE, Hamarcher VC (1983) A controlled-precision decimal arithmetic unit. IEEE Trans Comput C-32:370–377 CrossRef Cohen MS, Hull TE, Hamarcher VC (1983) A controlled-precision decimal arithmetic unit. IEEE Trans Comput C-32:370–377 CrossRef
8.
Zurück zum Zitat Chiarulli DM, Ruaa WG, Buell DA (1985) DRAFT: a dynamically reconfigurable processor for integer arithmetic. In: Proceedings of the 7th symposium on computer arithmetic, pp 309–318 Chiarulli DM, Ruaa WG, Buell DA (1985) DRAFT: a dynamically reconfigurable processor for integer arithmetic. In: Proceedings of the 7th symposium on computer arithmetic, pp 309–318
9.
Zurück zum Zitat Carter TM (1989) Cascade: hardware for high/variable precision arithmetic. In: Proceedings of the 9th symposium on computer arithmetic, pp 184–191 CrossRef Carter TM (1989) Cascade: hardware for high/variable precision arithmetic. In: Proceedings of the 9th symposium on computer arithmetic, pp 184–191 CrossRef
10.
Zurück zum Zitat Schulte MJ, Swartzlander EE Jr (2000) A family of variable-precision, interval arithmetic processors. IEEE Trans Comput 49(5):387–397 CrossRef Schulte MJ, Swartzlander EE Jr (2000) A family of variable-precision, interval arithmetic processors. IEEE Trans Comput 49(5):387–397 CrossRef
11.
Zurück zum Zitat El-Araby E, Gonzalez I, El-Ghazawi T (2007) Bringing high-performance reconfigurable computing to exact computations. In: Proceedings of FPL 2007, pp 79–85 El-Araby E, Gonzalez I, El-Ghazawi T (2007) Bringing high-performance reconfigurable computing to exact computations. In: Proceedings of FPL 2007, pp 79–85
12.
Zurück zum Zitat Alexandre FT, Milos DE (1998) A variable long-precision arithmetic unit design for reconfigurable coprocessor architectures. In: Proceedings of FCCM 1998 Alexandre FT, Milos DE (1998) A variable long-precision arithmetic unit design for reconfigurable coprocessor architectures. In: Proceedings of FCCM 1998
13.
Zurück zum Zitat Hormigo J, Villalba J (2000) A hardware algorithm for variable-precision division. In: Proceedings of the 4th conference on real numbers and computers, pp 1–7 Hormigo J, Villalba J (2000) A hardware algorithm for variable-precision division. In: Proceedings of the 4th conference on real numbers and computers, pp 1–7
14.
Zurück zum Zitat Hormigo J, Villalba J, Schulte M (2000) A hardware algorithm for variable-precision logarithm. In: Proceedings of ASAP2000, pp 215–224 Hormigo J, Villalba J, Schulte M (2000) A hardware algorithm for variable-precision logarithm. In: Proceedings of ASAP2000, pp 215–224
15.
Zurück zum Zitat Hormigo J, Villalba J, Zapata EL (1999) Interval sine and cosine functions computation based on variable-precision cordic algorithm. In: Proceedings of Arith 1999, pp 186–193 Hormigo J, Villalba J, Zapata EL (1999) Interval sine and cosine functions computation based on variable-precision cordic algorithm. In: Proceedings of Arith 1999, pp 186–193
16.
Zurück zum Zitat Hormigo J, Villalba J, Zapata EL (2004) CORDIC processor for variable-precision interval arithmetic. J VLSI Signal Process 37:21–39 CrossRef Hormigo J, Villalba J, Zapata EL (2004) CORDIC processor for variable-precision interval arithmetic. J VLSI Signal Process 37:21–39 CrossRef
17.
Zurück zum Zitat Saez E, Villalba J, Hormigo J, Quiles FJ, Benavides JI, Zapata EL (1998) FPGA implementation of a variable precision CORDIC processor. In: Proceedings of 13th conf on design of circuits and integrated systems (DCIS’98), pp 604–609 Saez E, Villalba J, Hormigo J, Quiles FJ, Benavides JI, Zapata EL (1998) FPGA implementation of a variable precision CORDIC processor. In: Proceedings of 13th conf on design of circuits and integrated systems (DCIS’98), pp 604–609
18.
Zurück zum Zitat Lei Y, Dou Y, Zhou J (2011) FPGA-specific custom VLIW architecture for arbitrary precision floating-point arithmetic. IEICE Trans Inf Syst E94-D(11):2173–2183 CrossRef Lei Y, Dou Y, Zhou J (2011) FPGA-specific custom VLIW architecture for arbitrary precision floating-point arithmetic. IEICE Trans Inf Syst E94-D(11):2173–2183 CrossRef
19.
Zurück zum Zitat Li XS, Demmel JW, Bailey DH, Henry G (2002) Design, implementation and testing of extended and mixed precision blas. ACM Trans Math Softw 18(2):152–205 CrossRef Li XS, Demmel JW, Bailey DH, Henry G (2002) Design, implementation and testing of extended and mixed precision blas. ACM Trans Math Softw 18(2):152–205 CrossRef
20.
Zurück zum Zitat Rump SM (1988) Algorithms for verified inclusions-theory and practice. In: Moore RE (ed) Reliability in computing. Academic Press, San Diego, pp C109–C126 Rump SM (1988) Algorithms for verified inclusions-theory and practice. In: Moore RE (ed) Reliability in computing. Academic Press, San Diego, pp C109–C126
21.
Zurück zum Zitat Kulisch U (1997) The fifth floating-point operation for top-performance computers. Universitat Karlsruhe Kulisch U (1997) The fifth floating-point operation for top-performance computers. Universitat Karlsruhe
22.
Zurück zum Zitat IEEE (2008) Standard for binary floating point arithmetic ansi/ieee standard 754-2008. The Institute of Electrical and Electronic Engineers, Inc. Revised version of original 754-1985 Standard IEEE (2008) Standard for binary floating point arithmetic ansi/ieee standard 754-2008. The Institute of Electrical and Electronic Engineers, Inc. Revised version of original 754-1985 Standard
23.
Zurück zum Zitat Edmonson W, Melquiond G (2009) IEEE interval standard working group—p1788: current status. In: Proceedings of Arith 2009, pp 183–190 Edmonson W, Melquiond G (2009) IEEE interval standard working group—p1788: current status. In: Proceedings of Arith 2009, pp 183–190
24.
26.
Zurück zum Zitat Lopes AR, Constantinides GA (2010) A fused hybrid floating-point and fixed-point dot-product for FPGAs. In: Proceedings of ARC 2010, vol 5992, pp 157–168 Lopes AR, Constantinides GA (2010) A fused hybrid floating-point and fixed-point dot-product for FPGAs. In: Proceedings of ARC 2010, vol 5992, pp 157–168
27.
Zurück zum Zitat Manoukian MV, Constantinides GA (2011) Accurate floating point arithmetic through hardware error-free transformations. In: Proceedings of ARC 2011, vol 6578, pp 94–101 Manoukian MV, Constantinides GA (2011) Accurate floating point arithmetic through hardware error-free transformations. In: Proceedings of ARC 2011, vol 6578, pp 94–101
28.
Zurück zum Zitat Dinechin FD, Pasca B, Cret O, Tudoran R (2008) An fpga-specific approach to floating-point accumulation and sum-of-products. In: Proceedings of FPT 2008, pp 33–40 Dinechin FD, Pasca B, Cret O, Tudoran R (2008) An fpga-specific approach to floating-point accumulation and sum-of-products. In: Proceedings of FPT 2008, pp 33–40
29.
Zurück zum Zitat Kulisch U (2008) Computer arithmetic and validity: theory, implementation, and applications. de Gruyter, Berlin MATH Kulisch U (2008) Computer arithmetic and validity: theory, implementation, and applications. de Gruyter, Berlin MATH
30.
Zurück zum Zitat Muller M, Rub C, Rulling W (1991) Exact accumulation of floating-point numbers. In: Proceedings of Arith 1991, pp 64–69 Muller M, Rub C, Rulling W (1991) Exact accumulation of floating-point numbers. In: Proceedings of Arith 1991, pp 64–69
31.
Zurück zum Zitat Knofel A (1991) A fast hardware units for the computation of accurate dot products. In: Proceedings of Arith 1991, pp 70–74 Knofel A (1991) A fast hardware units for the computation of accurate dot products. In: Proceedings of Arith 1991, pp 70–74
32.
Zurück zum Zitat Dou Y, Lei Y, Wu G (2010) FPGA accelerating double/quad-double high precision floating-point application for exascale computing. In: Proceedings of ICS 2010, pp 325–336 Dou Y, Lei Y, Wu G (2010) FPGA accelerating double/quad-double high precision floating-point application for exascale computing. In: Proceedings of ICS 2010, pp 325–336
33.
Zurück zum Zitat Underwood K (2004) FPGAs vs. CPUs: trends in peak floating-point performance. In: Proceedings of FPGA 2004, pp 171–180 Underwood K (2004) FPGAs vs. CPUs: trends in peak floating-point performance. In: Proceedings of FPGA 2004, pp 171–180
34.
Zurück zum Zitat Schulte MJ, Swartzlander EE Jr (1995) Hardware design and arithmetic algorithms for a variable-precision, interval arithmetic coprocessor. In: Proceedings of the 12th symposium on computer arithmetic, pp 222–228 CrossRef Schulte MJ, Swartzlander EE Jr (1995) Hardware design and arithmetic algorithms for a variable-precision, interval arithmetic coprocessor. In: Proceedings of the 12th symposium on computer arithmetic, pp 222–228 CrossRef
35.
Zurück zum Zitat Higham NJ (2002) Accuracy and stability of numerical algorithms, 2nd edn. Society for Industrial and Applied Mathematics, Philadelphia MATHCrossRef Higham NJ (2002) Accuracy and stability of numerical algorithms, 2nd edn. Society for Industrial and Applied Mathematics, Philadelphia MATHCrossRef
36.
Zurück zum Zitat Dou Y, Zhou J, Wu G, Jiang J, Lei Y (2010) A unified co-processor architecture for matrix decomposition. J Comput Sci Technol 25(4):874–885 MathSciNetCrossRef Dou Y, Zhou J, Wu G, Jiang J, Lei Y (2010) A unified co-processor architecture for matrix decomposition. J Comput Sci Technol 25(4):874–885 MathSciNetCrossRef
37.
Zurück zum Zitat Dou Y, Vassiliadis S, Kuzmanov GK, Gaydadjiev GN (2005) 64-bit floating-point FPGA matrix multiplication. In: Proceedings of FPGA 2005, pp 86–95 Dou Y, Vassiliadis S, Kuzmanov GK, Gaydadjiev GN (2005) 64-bit floating-point FPGA matrix multiplication. In: Proceedings of FPGA 2005, pp 86–95
38.
Zurück zum Zitat Fousse L, Hanrot G, Lefevre V, Pelissier P, Zimmermann P (2007) MPFR: a multiple-precision binary floating-point library with correct rounding. Trans Math Softw 33(2):1–15 MathSciNetCrossRef Fousse L, Hanrot G, Lefevre V, Pelissier P, Zimmermann P (2007) MPFR: a multiple-precision binary floating-point library with correct rounding. Trans Math Softw 33(2):1–15 MathSciNetCrossRef
Metadaten
Titel
FPGA implementation of an exact dot product and its application in variable-precision floating-point arithmetic
verfasst von
Yuanwu Lei
Yong Dou
Yazhuo Dong
Jie Zhou
Fei Xia
Publikationsdatum
01.05.2013
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 2/2013
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-012-0860-0

Weitere Artikel der Ausgabe 2/2013

The Journal of Supercomputing 2/2013 Zur Ausgabe

Premium Partner