Skip to main content

2019 | OriginalPaper | Buchkapitel

Data-Intensive Computing Acceleration with Python in Xilinx FPGA

verfasst von : Yalin Yang, Linjie Xu, Zichen Xu, Yuhao Wang

Erschienen in: Data Quality and Trust in Big Data

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Data-intensive workloads drive the development of hardware design. Such data intensive services are driven the raising trend of novel machine learning techniques, such as CNN/RNN, over massive chunks of data objects. These services require novel devices with configurable high throughput in I/O (i.e., data-based model training), and uniquely large computation capability (i.e., large number of convolutional operations). In this paper, we present our early work on realizing a python-based Field-Programmable Gate Array (FPGA) system to support such data-intensive services. In our current system, we deploy a light layer of CNN optimization and a mixed hardware setup, including multiple FPGA/GPU nodes, to provide performance acceleration on the run. Our prototype can support popular machine learning platform, such as Caffe, etc. Our initial empirical results show that our system can perfect handling all data-intensive learning services.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Class, S.: The 2016 Top Programming Languages. IEEE Spectrum (2016) Class, S.: The 2016 Top Programming Languages. IEEE Spectrum (2016)
3.
Zurück zum Zitat Haglund, P., Mencer, O., Luk, W., Tai, B.: PyHDL: hardware scripting with python. In: International Conference on Field Programmable Logic (2003) Haglund, P., Mencer, O., Luk, W., Tai, B.: PyHDL: hardware scripting with python. In: International Conference on Field Programmable Logic (2003)
4.
Zurück zum Zitat Decaluwe, J.: MyHDL: a python-based hardware description language. Linux J. 127, 84–87 (2004) Decaluwe, J.: MyHDL: a python-based hardware description language. Linux J. 127, 84–87 (2004)
5.
Zurück zum Zitat Logaras, E., Manolakos, E.: SysPy: using python for processorcentric SoC design. In: International Conference on Electronics, Circuits and Systems (2010) Logaras, E., Manolakos, E.: SysPy: using python for processorcentric SoC design. In: International Conference on Electronics, Circuits and Systems (2010)
6.
Zurück zum Zitat Lockhart, D., Zibrat, G., et al.: PyMTL: a unified framework for vertically integrated computer architecture research. In: International Symposium on Microarchitecture (2014) Lockhart, D., Zibrat, G., et al.: PyMTL: a unified framework for vertically integrated computer architecture research. In: International Symposium on Microarchitecture (2014)
7.
Zurück zum Zitat Koromilas, E., Stamelos, I.: Spark acceleration on FPGAs: a use case on machine learning in Pynq. In: MOCAST (2017) Koromilas, E., Stamelos, I.: Spark acceleration on FPGAs: a use case on machine learning in Pynq. In: MOCAST (2017)
8.
Zurück zum Zitat Schmidt, A., et al.: Evaluating rapid application development with Python for heterogeneous processor-based FPGAs. In: IEEE International Symposium on FCCM (2017) Schmidt, A., et al.: Evaluating rapid application development with Python for heterogeneous processor-based FPGAs. In: IEEE International Symposium on FCCM (2017)
10.
Zurück zum Zitat Wang, D., An, J., Xu, K.: PipeCNN: An OpenCL-Based FPGA Accelerator for Large-Scale Convolution Neuron Networks Wang, D., An, J., Xu, K.: PipeCNN: An OpenCL-Based FPGA Accelerator for Large-Scale Convolution Neuron Networks
11.
Zurück zum Zitat Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms, arXiv:1708.07747 [cs, stat], August 2017 Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms, arXiv:​1708.​07747 [cs, stat], August 2017
12.
Zurück zum Zitat Krizhevsky, A., Hinton, G., et al.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (2012) Krizhevsky, A., Hinton, G., et al.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (2012)
13.
Zurück zum Zitat Garson, J. et al.: Connectionism. In: Stanford Encyclopedia of Philosophy Garson, J. et al.: Connectionism. In: Stanford Encyclopedia of Philosophy
14.
Zurück zum Zitat Kachris, C., et al.: SPynq: acceleration of machine learning applications over Spark on Pynq. In: 2017 International Conference on (SAMOS). IEEE (2017) Kachris, C., et al.: SPynq: acceleration of machine learning applications over Spark on Pynq. In: 2017 International Conference on (SAMOS). IEEE (2017)
15.
Zurück zum Zitat Wang, E., Davis, J.J., Cheung, P.: A PYNQ-based Framework for Rapid CNN Prototyping Wang, E., Davis, J.J., Cheung, P.: A PYNQ-based Framework for Rapid CNN Prototyping
16.
Zurück zum Zitat Hearst, M.A., et al.: Support vector machines. IEEE Intell. Syst. Appl. 13(4), 18–28 (1998)CrossRef Hearst, M.A., et al.: Support vector machines. IEEE Intell. Syst. Appl. 13(4), 18–28 (1998)CrossRef
17.
Zurück zum Zitat Abadi, M., et al.: Tensorflow: a system for large-scale machine learning. In: OSDI, vol. 16 (2016) Abadi, M., et al.: Tensorflow: a system for large-scale machine learning. In: OSDI, vol. 16 (2016)
18.
Zurück zum Zitat Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. In: International Conference on Multimedia. ACM (2014) Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. In: International Conference on Multimedia. ACM (2014)
19.
Zurück zum Zitat Paszke, A., et al.: Automatic Differentiation in Pytorch (2017) Paszke, A., et al.: Automatic Differentiation in Pytorch (2017)
20.
Zurück zum Zitat Hinton, G., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Sig. Process. Mag. 29(6), 82–97 (2012)CrossRef Hinton, G., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Sig. Process. Mag. 29(6), 82–97 (2012)CrossRef
21.
Zurück zum Zitat Topcuoglu, H., Hariri, S., Wu, M.-Y.: Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans. Parallel Distrib. Syst. 13(3), 260–274 (2002)CrossRef Topcuoglu, H., Hariri, S., Wu, M.-Y.: Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans. Parallel Distrib. Syst. 13(3), 260–274 (2002)CrossRef
22.
Zurück zum Zitat Hai, J.C.T., Pun, O.C., Haw, T.W.: Accelerating video and image processing design for FPGA using HDL coder and simulink. In: 2015 IEEE Conference on Sustainable Utilization and Development in Engineering and Technology (CSUDET), pp. 1–5 (2015) Hai, J.C.T., Pun, O.C., Haw, T.W.: Accelerating video and image processing design for FPGA using HDL coder and simulink. In: 2015 IEEE Conference on Sustainable Utilization and Development in Engineering and Technology (CSUDET), pp. 1–5 (2015)
23.
Zurück zum Zitat Gannot, G., Ligthart, M.: Verilog HDL based FPGA design. In: International Verilog HDL Conference, pp. 86–92 (1994) Gannot, G., Ligthart, M.: Verilog HDL based FPGA design. In: International Verilog HDL Conference, pp. 86–92 (1994)
24.
Zurück zum Zitat Rasul, R., et al.: FPGA accelerated computing platform for MATLAB and C/C++. In: 2013 11th International Conference on Frontiers of Information Technology, pp. 166–171 (2013) Rasul, R., et al.: FPGA accelerated computing platform for MATLAB and C/C++. In: 2013 11th International Conference on Frontiers of Information Technology, pp. 166–171 (2013)
25.
26.
Zurück zum Zitat Ahmed, H.O., Ghoneima, M., Dessouky, M.: Concurrent MAC unit design using VHDL for deep learning networks on FPGA. In: 2018 IEEE Symposium on Computer Applications Industrial Electronics (ISCAIE), pp. 31–36 (2018) Ahmed, H.O., Ghoneima, M., Dessouky, M.: Concurrent MAC unit design using VHDL for deep learning networks on FPGA. In: 2018 IEEE Symposium on Computer Applications Industrial Electronics (ISCAIE), pp. 31–36 (2018)
27.
Zurück zum Zitat Nallatech.: FPGA Acceleration of Convolutional Neural Networks (2016) Nallatech.: FPGA Acceleration of Convolutional Neural Networks (2016)
28.
Zurück zum Zitat Qiao, Y., Shen, J., Xiao, T., Yang, Q., Wen, M., Zhang, C.: FPGA-accelerated deep convolutional neural networks for high throughput and energy efficiency. Pract. Exp. Concurr. Comput. (2016) Qiao, Y., Shen, J., Xiao, T., Yang, Q., Wen, M., Zhang, C.: FPGA-accelerated deep convolutional neural networks for high throughput and energy efficiency. Pract. Exp. Concurr. Comput. (2016)
29.
Zurück zum Zitat Stylianos I Venieris and Christos-Savvas Bouganis.: fpgaConvNet: a framework for mapping convolutional neural networks on FPGAs. In: 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), pp. 40–47. IEEE (2016) Stylianos I Venieris and Christos-Savvas Bouganis.: fpgaConvNet: a framework for mapping convolutional neural networks on FPGAs. In: 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), pp. 40–47. IEEE (2016)
30.
Zurück zum Zitat Umuroglu, Y., et al.: Finn: a framework for fast, scalable binarized neural network inference. In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA 17, pp. 65–74. ACM (2017) Umuroglu, Y., et al.: Finn: a framework for fast, scalable binarized neural network inference. In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA 17, pp. 65–74. ACM (2017)
31.
Zurück zum Zitat Qiu, J., et al.: Going deeper with embedded fpga platform for convolutional neural network. In: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 26–35. ACM (2016) Qiu, J., et al.: Going deeper with embedded fpga platform for convolutional neural network. In: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 26–35. ACM (2016)
32.
Zurück zum Zitat Che, S., Li, J., Sheaffer, J.W., Skadron, K., Lach, J.: Accelerating compute-intensive applications with GPUs and FPGAs. In: 2008 Symposium on Application Specific Processors, pp. 101–107 (2008) Che, S., Li, J., Sheaffer, J.W., Skadron, K., Lach, J.: Accelerating compute-intensive applications with GPUs and FPGAs. In: 2008 Symposium on Application Specific Processors, pp. 101–107 (2008)
33.
Zurück zum Zitat Fan, Z., Qiu, F., Kaufman, A., et al.: GPU cluster for high performance computing. In: Proceedings of the 2004 ACM/IEEE Conference on Supercomputing, p. 47. IEEE Computer Society (2004) Fan, Z., Qiu, F., Kaufman, A., et al.: GPU cluster for high performance computing. In: Proceedings of the 2004 ACM/IEEE Conference on Supercomputing, p. 47. IEEE Computer Society (2004)
34.
Zurück zum Zitat Kirk, D.: NVIDIA CUDA software and GPU parallel computing architecture. In: ISMM, vol. 7, pp. 103–104, October 2007 Kirk, D.: NVIDIA CUDA software and GPU parallel computing architecture. In: ISMM, vol. 7, pp. 103–104, October 2007
35.
Zurück zum Zitat Owens, J.D., Houston, M., Luebke, D., Green, S., Stone, J.E., Phillips, J.C.: GPU computing. Proc. IEEE 96(5), 879–899 (2008)CrossRef Owens, J.D., Houston, M., Luebke, D., Green, S., Stone, J.E., Phillips, J.C.: GPU computing. Proc. IEEE 96(5), 879–899 (2008)CrossRef
36.
Zurück zum Zitat Herbordt, M.C., et al.: Achieving high performance with FPGA-based computing. Computer 40(3), 50–57 (2007)CrossRef Herbordt, M.C., et al.: Achieving high performance with FPGA-based computing. Computer 40(3), 50–57 (2007)CrossRef
37.
Zurück zum Zitat Kachris, C., Koromilas, E., Stamelos, I., Soudris, D.: FPGA acceleration of spark applications in a Pynq cluster. In: 2017 27th International Conference on Field Programmable Logic and Applications (FPL), p. 1. IEEE, September 2017 Kachris, C., Koromilas, E., Stamelos, I., Soudris, D.: FPGA acceleration of spark applications in a Pynq cluster. In: 2017 27th International Conference on Field Programmable Logic and Applications (FPL), p. 1. IEEE, September 2017
38.
Zurück zum Zitat Koromilas, E., Stamelos, I., Kachris, C., Soudris, D.: Spark acceleration on FPGAs: a use case on machine learning in Pynq. In: 2017 6th International Conference on Modern Circuits and Systems Technologies (MOCAST), pp. 1–4. IEEE, May 2017 Koromilas, E., Stamelos, I., Kachris, C., Soudris, D.: Spark acceleration on FPGAs: a use case on machine learning in Pynq. In: 2017 6th International Conference on Modern Circuits and Systems Technologies (MOCAST), pp. 1–4. IEEE, May 2017
39.
Zurück zum Zitat Janßen, B., Zimprich, P., Hübner, M.: A dynamic partial reconfigurable overlay concept for PYNQ. In: 2017 27th International Conference on Field Programmable Logic and Applications (FPL), pp. 1–4. IEEE, September 2017 Janßen, B., Zimprich, P., Hübner, M.: A dynamic partial reconfigurable overlay concept for PYNQ. In: 2017 27th International Conference on Field Programmable Logic and Applications (FPL), pp. 1–4. IEEE, September 2017
40.
Zurück zum Zitat Stornaiuolo, L., Santambrogio, M., Sciuto, D.: On how to efficiently implement deep Learning algorithms on PYNQ platform. In: 2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), pp. 587–590. IEEE, July 2018 Stornaiuolo, L., Santambrogio, M., Sciuto, D.: On how to efficiently implement deep Learning algorithms on PYNQ platform. In: 2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), pp. 587–590. IEEE, July 2018
41.
Zurück zum Zitat Gokhale, M., Stone, J., Arnold, J., Kalinowski, M.: Stream-oriented FPGA computing in the streams-C high level language. In: 2000 IEEE Symposium on Field-Programmable Custom Computing Machines, pp. 49–56. IEEE (2000) Gokhale, M., Stone, J., Arnold, J., Kalinowski, M.: Stream-oriented FPGA computing in the streams-C high level language. In: 2000 IEEE Symposium on Field-Programmable Custom Computing Machines, pp. 49–56. IEEE (2000)
42.
Zurück zum Zitat Gokhale, M., Minnich, R.: FPGA computing in a data parallel C. In: IEEE Workshop on FPGAs for Custom Computing Machines, 1993, Proceedings, pp. 94–101. IEEE, April 1993 Gokhale, M., Minnich, R.: FPGA computing in a data parallel C. In: IEEE Workshop on FPGAs for Custom Computing Machines, 1993, Proceedings, pp. 94–101. IEEE, April 1993
43.
Zurück zum Zitat Hauck, S., DeHon, A.: Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation, vol. 1. Elsevier (2010) Hauck, S., DeHon, A.: Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation, vol. 1. Elsevier (2010)
44.
Zurück zum Zitat Shirazi, N., Walters, A., Athanas, P.: Quantitative analysis of floating point arithmetic on FPGA based custom computing machines. In: FCCM, p. 0155. IEEE, April 1995 Shirazi, N., Walters, A., Athanas, P.: Quantitative analysis of floating point arithmetic on FPGA based custom computing machines. In: FCCM, p. 0155. IEEE, April 1995
Metadaten
Titel
Data-Intensive Computing Acceleration with Python in Xilinx FPGA
verfasst von
Yalin Yang
Linjie Xu
Zichen Xu
Yuhao Wang
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-030-19143-6_8