Skip to main content
Erschienen in: The Journal of Supercomputing 2/2021

28.05.2020

A systematic literature review on hardware implementation of artificial intelligence algorithms

verfasst von: Manar Abu Talib, Sohaib Majzoub, Qassim Nasir, Dina Jamal

Erschienen in: The Journal of Supercomputing | Ausgabe 2/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Artificial intelligence (AI) and machine learning (ML) tools play a significant role in the recent evolution of smart systems. AI solutions are pushing towards a significant shift in many fields such as healthcare, autonomous airplanes and vehicles, security, marketing customer profiling and other diverse areas. One of the main challenges hindering the AI potential is the demand for high-performance computation resources. Recently, hardware accelerators are developed in order to provide the needed computational power for the AI and ML tools. In the literature, hardware accelerators are built using FPGAs, GPUs and ASICs to accelerate computationally intensive tasks. These accelerators provide high-performance hardware while preserving the required accuracy. In this work, we present a systematic literature review that focuses on exploring the available hardware accelerators for the AI and ML tools. More than 169 different research papers published between the years 2009 and 2019 are studied and analysed.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Sze V, Chen Y-H, Yang T-J, Emer JS (2017) Efficient processing of deep neural networks: a tutorial and survey. Proc IEEE 105(12):2295–2329CrossRef Sze V, Chen Y-H, Yang T-J, Emer JS (2017) Efficient processing of deep neural networks: a tutorial and survey. Proc IEEE 105(12):2295–2329CrossRef
2.
Zurück zum Zitat Pau LF (1991) Artificial intelligence and financial services. IEEE Trans Knowl Data Eng 3(2):137–148CrossRef Pau LF (1991) Artificial intelligence and financial services. IEEE Trans Knowl Data Eng 3(2):137–148CrossRef
3.
Zurück zum Zitat Yao X, Zhou J, Zhang J, Boer CR (2017) From intelligent manufacturing to smart manufacturing for industry 4.0 driven by next generation artificial intelligence and further on. In: 5th International Conference on Enterprise Systems (ES) Yao X, Zhou J, Zhang J, Boer CR (2017) From intelligent manufacturing to smart manufacturing for industry 4.0 driven by next generation artificial intelligence and further on. In: 5th International Conference on Enterprise Systems (ES)
4.
Zurück zum Zitat Bishnoi L, Narayan Singh S (2018) Artificial intelligence techniques used in medical sciences: a review. In: 8th International Conference on Cloud Computing, Data Science and Engineering (Confluence), pp 106–113 Bishnoi L, Narayan Singh S (2018) Artificial intelligence techniques used in medical sciences: a review. In: 8th International Conference on Cloud Computing, Data Science and Engineering (Confluence), pp 106–113
5.
Zurück zum Zitat Parker DS (1989) Integrating AI and DBMS through stream processing. In: Proceedings of Fifth International Conference on Data Engineering Parker DS (1989) Integrating AI and DBMS through stream processing. In: Proceedings of Fifth International Conference on Data Engineering
6.
Zurück zum Zitat Fraley JB, Cannady J (2017) The promise of machine learning in cybersecurity. SoutheastCon Fraley JB, Cannady J (2017) The promise of machine learning in cybersecurity. SoutheastCon
7.
Zurück zum Zitat Farabet C, Poulet C, Han JY, LeCun Y (2009). CNP: an FPGA-based processor for convolutional networks. Presented at the 2009 International Conference on Field Programmable Logic and Applications (FPL) Farabet C, Poulet C, Han JY, LeCun Y (2009). CNP: an FPGA-based processor for convolutional networks. Presented at the 2009 International Conference on Field Programmable Logic and Applications (FPL)
8.
Zurück zum Zitat Rao Q, Frtunikj J (2018) Deep learning for self-driving cars. In: Proceedings of the 1st International Workshop on Software Engineering for AI in Autonomous Systems—SEFAIS ’18 Rao Q, Frtunikj J (2018) Deep learning for self-driving cars. In: Proceedings of the 1st International Workshop on Software Engineering for AI in Autonomous Systems—SEFAIS ’18
9.
Zurück zum Zitat Duffany JL (2010) Artificial intelligence in GPS navigation systems. Presented at the 2010 2nd International Conference on Software Technology and Engineering (ICSTE 2010) Duffany JL (2010) Artificial intelligence in GPS navigation systems. Presented at the 2010 2nd International Conference on Software Technology and Engineering (ICSTE 2010)
10.
Zurück zum Zitat Schutzer D (1983) Applications of artificial intelligence to military communications. In: IEEE Military Communications Conference, pp 786–790 Schutzer D (1983) Applications of artificial intelligence to military communications. In: IEEE Military Communications Conference, pp 786–790
11.
Zurück zum Zitat Misra J, Saha I (2010) Artificial neural networks in hardware: a survey of two decades of progress. Neurocomputing 74(1–3):239–255CrossRef Misra J, Saha I (2010) Artificial neural networks in hardware: a survey of two decades of progress. Neurocomputing 74(1–3):239–255CrossRef
12.
Zurück zum Zitat Baji T (2018) Evolution of the GPU device widely used in AI and massive parallel processing. In: IEEE 2nd Electron Devices Technology and Manufacturing Conference (EDTM) Baji T (2018) Evolution of the GPU device widely used in AI and massive parallel processing. In: IEEE 2nd Electron Devices Technology and Manufacturing Conference (EDTM)
13.
Zurück zum Zitat Jawandhiya P (2018) Hardware design for machine learning. Int J Artif Intell Appl (IJAIA) 9(1):63–84 Jawandhiya P (2018) Hardware design for machine learning. Int J Artif Intell Appl (IJAIA) 9(1):63–84
14.
Zurück zum Zitat Shawahna A, Sait SM, El-Maleh A (2019) FPGA-based accelerators of deep learning networks for learning and classification: a review. IEEE Access 7:7823–7859CrossRef Shawahna A, Sait SM, El-Maleh A (2019) FPGA-based accelerators of deep learning networks for learning and classification: a review. IEEE Access 7:7823–7859CrossRef
15.
Zurück zum Zitat Lucas SM (2009) Computational intelligence and AI in games: a new IEEE transaction. IEEE Trans Comput Intell AI Games 1(1):1–3CrossRef Lucas SM (2009) Computational intelligence and AI in games: a new IEEE transaction. IEEE Trans Comput Intell AI Games 1(1):1–3CrossRef
16.
Zurück zum Zitat Rigos S (2012) A hardware acceleration unit for face detection. In: Mediterranean Conference on Embedded Computing (MECO), Bar, pp 17–21 Rigos S (2012) A hardware acceleration unit for face detection. In: Mediterranean Conference on Embedded Computing (MECO), Bar, pp 17–21
17.
Zurück zum Zitat Mittal S (2018) A survey of FPGA-based accelerators for convolutional neural networks. Neural Comput Appl 32(4):1109–1139CrossRef Mittal S (2018) A survey of FPGA-based accelerators for convolutional neural networks. Neural Comput Appl 32(4):1109–1139CrossRef
18.
Zurück zum Zitat Guo K, Zeng S, Yu J, Wang Y, Yang H (2019) [DL] A survey of FPGA-based neural network inference accelerators. ACM Trans Reconfig Technol Syst 12(1):1–26CrossRef Guo K, Zeng S, Yu J, Wang Y, Yang H (2019) [DL] A survey of FPGA-based neural network inference accelerators. ACM Trans Reconfig Technol Syst 12(1):1–26CrossRef
19.
Zurück zum Zitat Wang T, Wang C, Zhou X, Chen H (2018) A survey of FPGA based deep learning accelerators: challenges and opportunities. arXiv preprint arXiv:1901.04988 Wang T, Wang C, Zhou X, Chen H (2018) A survey of FPGA based deep learning accelerators: challenges and opportunities. arXiv preprint arXiv:​1901.​04988
20.
Zurück zum Zitat Budgen D, Brereton P (2006) Performing systematic literature reviews in software engineering. In: Proceeding of the 28th International Conference on Software Engineering—ICSE ’06 Budgen D, Brereton P (2006) Performing systematic literature reviews in software engineering. In: Proceeding of the 28th International Conference on Software Engineering—ICSE ’06
21.
Zurück zum Zitat Ma Y, Cao Y, Vrudhula S, Seo J (2017) An automatic RTL compiler for high-throughput FPGA implementation of diverse deep convolutional neural networks. In: 27th International Conference on Field Programmable Logic and Applications (FPL) Ma Y, Cao Y, Vrudhula S, Seo J (2017) An automatic RTL compiler for high-throughput FPGA implementation of diverse deep convolutional neural networks. In: 27th International Conference on Field Programmable Logic and Applications (FPL)
22.
Zurück zum Zitat Nurvitadhi E, Venkatesh G, Sim J, Marr D, Huang R, Ong Gee Hock J, Liew YT, Srivatsan K, Moss D, Subhaschandra S, Boudoukh G (2017) Can FPGAs beat GPUs in accelerating next-generation deep neural networks? In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays—FPGA ’17 Nurvitadhi E, Venkatesh G, Sim J, Marr D, Huang R, Ong Gee Hock J, Liew YT, Srivatsan K, Moss D, Subhaschandra S, Boudoukh G (2017) Can FPGAs beat GPUs in accelerating next-generation deep neural networks? In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays—FPGA ’17
24.
Zurück zum Zitat Faraone J, Gambardella G, Boland D, Fraser N, Blott M, Leong PHW (2018) Customizing low-precision deep neural networks for FPGAs. In: 28th International Conference on Field Programmable Logic and Applications (FPL) Faraone J, Gambardella G, Boland D, Fraser N, Blott M, Leong PHW (2018) Customizing low-precision deep neural networks for FPGAs. In: 28th International Conference on Field Programmable Logic and Applications (FPL)
25.
Zurück zum Zitat Cheng Kwang-Ting, Wang Yi-Chu (2011) Using mobile GPU for general-purpose computing; a case study of face recognition on smartphones. In: Proceedings of 2011 International Symposium on VLSI Design, Automation and Test Cheng Kwang-Ting, Wang Yi-Chu (2011) Using mobile GPU for general-purpose computing; a case study of face recognition on smartphones. In: Proceedings of 2011 International Symposium on VLSI Design, Automation and Test
26.
Zurück zum Zitat Ouerhani Y, Jridi M, AlFalou A (2010) Fast face recognition approach using a graphical processing unit “GPU”. In: IEEE International Conference on Imaging Systems and Techniques Ouerhani Y, Jridi M, AlFalou A (2010) Fast face recognition approach using a graphical processing unit “GPU”. In: IEEE International Conference on Imaging Systems and Techniques
27.
Zurück zum Zitat Li E, Wang B, Yang L, Peng Y, Du Y, Zhang Y, Chiu Y-J (2012) GPU and CPU cooperative acceleration for face detection on modern processors. Presented at the 2012 IEEE International Conference on Multimedia and Expo (ICME) Li E, Wang B, Yang L, Peng Y, Du Y, Zhang Y, Chiu Y-J (2012) GPU and CPU cooperative acceleration for face detection on modern processors. Presented at the 2012 IEEE International Conference on Multimedia and Expo (ICME)
28.
Zurück zum Zitat Shah AA, Zaidi ZA, Chowdhry BS, Daudpoto J (2016) Real time face detection/monitor using raspberry pi and MATLAB. In: IEEE 10th International Conference on Application of Information and Communication Technologies (AICT) Shah AA, Zaidi ZA, Chowdhry BS, Daudpoto J (2016) Real time face detection/monitor using raspberry pi and MATLAB. In: IEEE 10th International Conference on Application of Information and Communication Technologies (AICT)
29.
Zurück zum Zitat Oro D, Fernandez C, Saeta JR, Martorell X, Hernando J (2011) Real-time GPU-based face detection in HD video sequences. In: IEEE International Conference on Computer Vision Workshops (ICCV Workshops) Oro D, Fernandez C, Saeta JR, Martorell X, Hernando J (2011) Real-time GPU-based face detection in HD video sequences. In: IEEE International Conference on Computer Vision Workshops (ICCV Workshops)
30.
Zurück zum Zitat Gao C, Lu SL (2008) Novel FPGA based Haar classifier face detection algorithm acceleration. Presented at the 2008 International Conference on Field Programmable Logic and Applications (FPL) Gao C, Lu SL (2008) Novel FPGA based Haar classifier face detection algorithm acceleration. Presented at the 2008 International Conference on Field Programmable Logic and Applications (FPL)
31.
Zurück zum Zitat Cho J, Mirzaei S, Oberg J, Kastner R (2009) FPGA-based face detection system using Haar classifiers. In: Proceeding of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays—FPGA ’09 Cho J, Mirzaei S, Oberg J, Kastner R (2009) FPGA-based face detection system using Haar classifiers. In: Proceeding of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays—FPGA ’09
32.
Zurück zum Zitat He C, Papakonstantinou A, Chen D (2009) A novel SoC architecture on FPGA for ultra fast face detection. Presented at the 2009 IEEE International Conference on Computer Design (ICCD 2009) He C, Papakonstantinou A, Chen D (2009) A novel SoC architecture on FPGA for ultra fast face detection. Presented at the 2009 IEEE International Conference on Computer Design (ICCD 2009)
33.
Zurück zum Zitat Farrugia N, Mamalet F, Roux S, Fan Yang, Paindavoine M (2009) Fast and robust face detection on a parallel optimized architecture implemented on FPGA. IEEE Trans Circuits Syst Video Technol 19(4):597–602CrossRef Farrugia N, Mamalet F, Roux S, Fan Yang, Paindavoine M (2009) Fast and robust face detection on a parallel optimized architecture implemented on FPGA. IEEE Trans Circuits Syst Video Technol 19(4):597–602CrossRef
34.
Zurück zum Zitat Farabet C, Poulet C, LeCun Y (2009) An FPGA-based stream processor for embedded real-time vision with convolutional networks. In: IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops Farabet C, Poulet C, LeCun Y (2009) An FPGA-based stream processor for embedded real-time vision with convolutional networks. In: IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops
35.
Zurück zum Zitat Kyrkou C, Theocharides T (2011) A flexible parallel hardware architecture for AdaBoost-based real-time object detection. IEEE Trans Very Large Scale Integr (VLSI) Syst 19(6):1034–1047 Kyrkou C, Theocharides T (2011) A flexible parallel hardware architecture for AdaBoost-based real-time object detection. IEEE Trans Very Large Scale Integr (VLSI) Syst 19(6):1034–1047
36.
Zurück zum Zitat Zhou W, Zou Y, Dai L, Zeng X (2011) A high speed reconfigurable face detection architecture. Presented at the 2011 IEEE 9th International Conference on ASIC (ASICON 2011) Zhou W, Zou Y, Dai L, Zeng X (2011) A high speed reconfigurable face detection architecture. Presented at the 2011 IEEE 9th International Conference on ASIC (ASICON 2011)
37.
Zurück zum Zitat Wang N-J, Chang S-C, Chou P-J (2012) A real-time multi-face detection system implemented on FPGA. Presented at the 2012 International Symposium on Intelligent Signal Processing and Communications Systems (ISPACS 2012) Wang N-J, Chang S-C, Chou P-J (2012) A real-time multi-face detection system implemented on FPGA. Presented at the 2012 International Symposium on Intelligent Signal Processing and Communications Systems (ISPACS 2012)
38.
Zurück zum Zitat Bauer S, Brunsmann U, Schlotterbeck-Macht S (2009) FPGA implementation of a HOG-based pedestrian recognition system. In: MPC Workshop, pp 49–58 Bauer S, Brunsmann U, Schlotterbeck-Macht S (2009) FPGA implementation of a HOG-based pedestrian recognition system. In: MPC Workshop, pp 49–58
39.
Zurück zum Zitat Hiromoto M, Miyamoto R (2009) Hardware architecture for high-accuracy real-time pedestrian detection with CoHOG features. In: IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops Hiromoto M, Miyamoto R (2009) Hardware architecture for high-accuracy real-time pedestrian detection with CoHOG features. In: IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops
40.
Zurück zum Zitat Bauer S, Kohler S, Doll K, Brunsmann U (2010) FPGA-GPU architecture for kernel SVM pedestrian detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition—Workshops Bauer S, Kohler S, Doll K, Brunsmann U (2010) FPGA-GPU architecture for kernel SVM pedestrian detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition—Workshops
41.
Zurück zum Zitat Kryjak T, Komorkiewicz M, Gorgon M (2012) FPGA implementation of real-time headshoulder detection using local binary patterns, SVM and foreground object detection. In: Conference on Design and Architectures for Signal and Image Processing (DASIP), pp 1–8 Kryjak T, Komorkiewicz M, Gorgon M (2012) FPGA implementation of real-time headshoulder detection using local binary patterns, SVM and foreground object detection. In: Conference on Design and Architectures for Signal and Image Processing (DASIP), pp 1–8
42.
Zurück zum Zitat Sharma B, Thota R, Vydyanathan N, Kale A (2009) Towards a robust, real-time face processing system using CUDA-enabled GPUs. In: International Conference on High Performance Computing (HiPC) Sharma B, Thota R, Vydyanathan N, Kale A (2009) Towards a robust, real-time face processing system using CUDA-enabled GPUs. In: International Conference on High Performance Computing (HiPC)
43.
Zurück zum Zitat Kong J, Deng Y (2010) GPU accelerated face detection. In: International Conference on Intelligent Control and Information Processing Kong J, Deng Y (2010) GPU accelerated face detection. In: International Conference on Intelligent Control and Information Processing
44.
Zurück zum Zitat Hefenbrock D, Oberg J, Thanh NTN, Kastner R, Baden SB (2010) Accelerating Viola-Jones face detection to FPGA-level using GPUs. In: 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines Hefenbrock D, Oberg J, Thanh NTN, Kastner R, Baden SB (2010) Accelerating Viola-Jones face detection to FPGA-level using GPUs. In: 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines
45.
Zurück zum Zitat Masek J, Burget R, Uher V, Guney S (2013) Speeding up Viola-Jones algorithm using multi-Core GPU implementation. Presented at the 2013 36th International Conference on Telecommunications and Signal Processing (TSP) Masek J, Burget R, Uher V, Guney S (2013) Speeding up Viola-Jones algorithm using multi-Core GPU implementation. Presented at the 2013 36th International Conference on Telecommunications and Signal Processing (TSP)
46.
Zurück zum Zitat Jain V, Patel D (2016) A GPU based implementation of robust face detection system. Procedia Comput Sci 87:156–163CrossRef Jain V, Patel D (2016) A GPU based implementation of robust face detection system. Procedia Comput Sci 87:156–163CrossRef
47.
Zurück zum Zitat Lescano G, Santana P, Costaguta R (2017) Analysis of a GPU implementation of Viola-Jones’ algorithm for features selection. J Comput Sci Technol 17(1):68–73 Lescano G, Santana P, Costaguta R (2017) Analysis of a GPU implementation of Viola-Jones’ algorithm for features selection. J Comput Sci Technol 17(1):68–73
48.
Zurück zum Zitat Hahnle M, Saxen F, Hisung M, Brunsmann U, Doll K (2013) FPGA-based real-time pedestrian detection on high-resolution images. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 629–635 Hahnle M, Saxen F, Hisung M, Brunsmann U, Doll K (2013) FPGA-based real-time pedestrian detection on high-resolution images. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 629–635
49.
Zurück zum Zitat Komorkiewicz M, Kluczewski M, Gorgon M (2012) Floating point HOG implementation for real-time multiple object detection. Presented at the 2012 22nd International Conference on Field Programmable Logic and Applications (FPL) Komorkiewicz M, Kluczewski M, Gorgon M (2012) Floating point HOG implementation for real-time multiple object detection. Presented at the 2012 22nd International Conference on Field Programmable Logic and Applications (FPL)
50.
Zurück zum Zitat Ma X, Najjar WA, Roy-Chowdhury AK (2015) Evaluation and acceleration of high-throughput fixed-point object detection on FPGAs. IEEE Trans Circuits Syst Video Technol 25(6):1051–1062CrossRef Ma X, Najjar WA, Roy-Chowdhury AK (2015) Evaluation and acceleration of high-throughput fixed-point object detection on FPGAs. IEEE Trans Circuits Syst Video Technol 25(6):1051–1062CrossRef
51.
Zurück zum Zitat Dwith CYN, Rathna GN (2012) Parallel implementation of LBP based face recognition on GPU using OpenCL. In: The International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT), pp 755–760 Dwith CYN, Rathna GN (2012) Parallel implementation of LBP based face recognition on GPU using OpenCL. In: The International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT), pp 755–760
52.
Zurück zum Zitat Oh C, Yi S, Yi Y (2015) Real-time face detection in full HD images exploiting both embedded CPU and GPU. Presented at the 2015 IEEE International Conference on Multimedia and Expo (ICME) Oh C, Yi S, Yi Y (2015) Real-time face detection in full HD images exploiting both embedded CPU and GPU. Presented at the 2015 IEEE International Conference on Multimedia and Expo (ICME)
53.
Zurück zum Zitat Oh C, Yi S, Yi Y (2018) Real-time and energy-efficient face detection on CPU-GPU heterogeneous embedded platforms. IEICE Trans Inf Syst E 101(12):2878–2888CrossRef Oh C, Yi S, Yi Y (2018) Real-time and energy-efficient face detection on CPU-GPU heterogeneous embedded platforms. IEICE Trans Inf Syst E 101(12):2878–2888CrossRef
54.
Zurück zum Zitat Negi K, Dohi K, Shibata Y, Oguri K (2011) Deep pipelined one-chip FPGA implementation of a real-time image-based human detection algorithm. In: International Conference on Field-Programmable Technology Negi K, Dohi K, Shibata Y, Oguri K (2011) Deep pipelined one-chip FPGA implementation of a real-time image-based human detection algorithm. In: International Conference on Field-Programmable Technology
55.
Zurück zum Zitat Zhao J, Zhu S, Huang X (2013) Real-time traffic sign detection using SURF features on FPGA. In: IEEE High Performance Extreme Computing Conference (HPEC) Zhao J, Zhu S, Huang X (2013) Real-time traffic sign detection using SURF features on FPGA. In: IEEE High Performance Extreme Computing Conference (HPEC)
56.
Zurück zum Zitat Nasse F, Thurau C, Fink GA (2009) Face detection using GPU-based convolutional neural networks. In Proceedings of the 13th international conference on computer analysis of images and patterns. Springer, Berlin, pp 83–90 Nasse F, Thurau C, Fink GA (2009) Face detection using GPU-based convolutional neural networks. In Proceedings of the 13th international conference on computer analysis of images and patterns. Springer, Berlin, pp 83–90
57.
Zurück zum Zitat Li H, Lin Z, Shen X, Brandt J, Hua G (2015) A convolutional neural network cascade for face detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 5325–5334 Li H, Lin Z, Shen X, Brandt J, Hua G (2015) A convolutional neural network cascade for face detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 5325–5334
58.
Zurück zum Zitat Cengil E, Cinar A, Guler Z (2017) A GPU-based convolutional neural network approach for image classification. Presented at the 2017 International Artificial Intelligence and Data Processing Symposium (IDAP) Cengil E, Cinar A, Guler Z (2017) A GPU-based convolutional neural network approach for image classification. Presented at the 2017 International Artificial Intelligence and Data Processing Symposium (IDAP)
59.
Zurück zum Zitat Tijtgat N, Ranst WV, Volckaert B, Goedeme T, Turck FD (2017) Embedded real-time object detection for a UAV warning system. ICCVW. Venice, Italy, pp 2110–2118 Tijtgat N, Ranst WV, Volckaert B, Goedeme T, Turck FD (2017) Embedded real-time object detection for a UAV warning system. ICCVW. Venice, Italy, pp 2110–2118
60.
Zurück zum Zitat Berjon D, Cuevas C, Moran F, Garcia N (2013) GPU-based implementation of an optimized nonparametric background modeling for real-time moving object detection. IEEE Trans Consum Electron 59(2):361–369CrossRef Berjon D, Cuevas C, Moran F, Garcia N (2013) GPU-based implementation of an optimized nonparametric background modeling for real-time moving object detection. IEEE Trans Consum Electron 59(2):361–369CrossRef
61.
Zurück zum Zitat Obukhov A (2011) Haar classifiers for object detection with CUDA. In: GPU computing gems, Emerald Edition. Elsevier, pp 517–544 Obukhov A (2011) Haar classifiers for object detection with CUDA. In: GPU computing gems, Emerald Edition. Elsevier, pp 517–544
62.
Zurück zum Zitat Pertsau D, Uvarov A (2013) Face detection algorithm using Haar-like feature for GPU architecture. In: IEEE 7th International Conference on Intelligent Data Acquisition and Advanced Computing Systems (IDAACS) Pertsau D, Uvarov A (2013) Face detection algorithm using Haar-like feature for GPU architecture. In: IEEE 7th International Conference on Intelligent Data Acquisition and Advanced Computing Systems (IDAACS)
63.
Zurück zum Zitat Coates A, Baumstarck P, Le Q, Ng AY (2009) Scalable learning for object detection with GPU hardware. In: IEEE/RSJ International Conference on Intelligent Robots and Systems Coates A, Baumstarck P, Le Q, Ng AY (2009) Scalable learning for object detection with GPU hardware. In: IEEE/RSJ International Conference on Intelligent Robots and Systems
64.
Zurück zum Zitat Oro D, Fern’ndez C, Segura C, Martorell X, Hernando J (2012) Accelerating boosting-based face detection on GPUs. In: 41st International Conference on Parallel Processing Oro D, Fern’ndez C, Segura C, Martorell X, Hernando J (2012) Accelerating boosting-based face detection on GPUs. In: 41st International Conference on Parallel Processing
65.
Zurück zum Zitat Herout A, Jošth R, Juránek R, Havel J, Hradiš M, Zemčík P (2010) Real-time object detection on CUDA. J Real-Time Image Proc 6(3):159–170CrossRef Herout A, Jošth R, Juránek R, Havel J, Hradiš M, Zemčík P (2010) Real-time object detection on CUDA. J Real-Time Image Proc 6(3):159–170CrossRef
66.
Zurück zum Zitat Zhuang H, Low K-S, Yau W-Y (2012) Multichannel pulse-coupled-neural-network-based color image segmentation for object detection. IEEE Trans Ind Electron 59(8):3299–3308CrossRef Zhuang H, Low K-S, Yau W-Y (2012) Multichannel pulse-coupled-neural-network-based color image segmentation for object detection. IEEE Trans Ind Electron 59(8):3299–3308CrossRef
67.
Zurück zum Zitat Lozano OM, Otsuka K (2008) Simultaneous and fast 3D tracking of multiple faces in video by GPU-based stream processing. In: IEEE International Conference on Acoustics. Speech and Signal Processing, ICASSP, p 2008 Lozano OM, Otsuka K (2008) Simultaneous and fast 3D tracking of multiple faces in video by GPU-based stream processing. In: IEEE International Conference on Acoustics. Speech and Signal Processing, ICASSP, p 2008
68.
Zurück zum Zitat Possa PR, Mahmoudi SA, Harb N, Valderrama C, Manneback P (2014) A multi-resolution FPGA-based architecture for real-time edge and corner detection. IEEE Trans Comput 63(10):2376–2388MathSciNetCrossRef Possa PR, Mahmoudi SA, Harb N, Valderrama C, Manneback P (2014) A multi-resolution FPGA-based architecture for real-time edge and corner detection. IEEE Trans Comput 63(10):2376–2388MathSciNetCrossRef
69.
Zurück zum Zitat Barbosa JPF, Ferreira APA, Rocha RCF, Albuquerque ES, Reis JR, Albuquerque DS, Barros ENS (2015) A high performance hardware accelerator for dynamic texture segmentation. J Syst Archit 61(10):639–645CrossRef Barbosa JPF, Ferreira APA, Rocha RCF, Albuquerque ES, Reis JR, Albuquerque DS, Barros ENS (2015) A high performance hardware accelerator for dynamic texture segmentation. J Syst Archit 61(10):639–645CrossRef
70.
Zurück zum Zitat Kryjak T, Komorkiewicz M, Gorgon M (2012) Real-time background generation and foreground object segmentation for high-definition colour video stream in FPGA device. J Real-Time Image Proc 9(1):61–77CrossRef Kryjak T, Komorkiewicz M, Gorgon M (2012) Real-time background generation and foreground object segmentation for high-definition colour video stream in FPGA device. J Real-Time Image Proc 9(1):61–77CrossRef
71.
Zurück zum Zitat Park J, Sung W (2016) FPGA based implementation of deep neural networks using on-chip memory only. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Park J, Sung W (2016) FPGA based implementation of deep neural networks using on-chip memory only. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
72.
Zurück zum Zitat Zhao M, Hu C, Wei F, Wang K, Wang C, Jiang Y (2019) Real-time underwater image recognition with FPGA embedded system for convolutional neural network. Sensors 19(2):350CrossRef Zhao M, Hu C, Wei F, Wang K, Wang C, Jiang Y (2019) Real-time underwater image recognition with FPGA embedded system for convolutional neural network. Sensors 19(2):350CrossRef
73.
Zurück zum Zitat Zhang T, Zhou W, Jiang X, Liu Y (2018) FPGA-based implementation of hand gesture recognition using convolutional neural network. Presented at the 2018 IEEE International Conference on Cyborg and Bionic Systems (CBS) Zhang T, Zhou W, Jiang X, Liu Y (2018) FPGA-based implementation of hand gesture recognition using convolutional neural network. Presented at the 2018 IEEE International Conference on Cyborg and Bionic Systems (CBS)
74.
Zurück zum Zitat Reyes E, Gómez C, Norambuena E, Ruiz-del-Solar J (2019) Near real-time object recognition for pepper based on deep neural networks running on a backpack. In: RoboCup 2018: Robot World Cup XXII. Springer, pp 287–298 Reyes E, Gómez C, Norambuena E, Ruiz-del-Solar J (2019) Near real-time object recognition for pepper based on deep neural networks running on a backpack. In: RoboCup 2018: Robot World Cup XXII. Springer, pp 287–298
75.
Zurück zum Zitat Zhou Y, Wang W, Huang X (2015) FPGA design for PCANet deep learning network. In: IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines Zhou Y, Wang W, Huang X (2015) FPGA design for PCANet deep learning network. In: IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines
76.
Zurück zum Zitat Hikawa H, Kaida K (2015) Novel FPGA implementation of hand sign recognition system with SOM-Hebb classifier. IEEE Trans Circuits Syst Video Technol 25(1):153–166CrossRef Hikawa H, Kaida K (2015) Novel FPGA implementation of hand sign recognition system with SOM-Hebb classifier. IEEE Trans Circuits Syst Video Technol 25(1):153–166CrossRef
77.
Zurück zum Zitat Svab J, Krajnik T, Faigl J, Preucil L (2009) FPGA based speeded up robust features. Presented at the 2009 IEEE International Conference on Technologies for Practical Robot Applications (TePRA) Svab J, Krajnik T, Faigl J, Preucil L (2009) FPGA based speeded up robust features. Presented at the 2009 IEEE International Conference on Technologies for Practical Robot Applications (TePRA)
78.
Zurück zum Zitat Yao L, Feng H, Zhu Y, Jiang Z, Zhao D, Feng W (2009) An architecture of optimised SIFT feature detection for an FPGA implementation of an image matcher. In: International Conference on Field-Programmable Technology Yao L, Feng H, Zhu Y, Jiang Z, Zhao D, Feng W (2009) An architecture of optimised SIFT feature detection for an FPGA implementation of an image matcher. In: International Conference on Field-Programmable Technology
79.
Zurück zum Zitat Gu Q, Takaki T, Ishii I (2013) Fast FPGA-based multiobject feature extraction. IEEE Trans Circuits Syst Video Technol 23(1):30–45CrossRef Gu Q, Takaki T, Ishii I (2013) Fast FPGA-based multiobject feature extraction. IEEE Trans Circuits Syst Video Technol 23(1):30–45CrossRef
80.
Zurück zum Zitat Knag P, Kim JK, Chen T, Zhang Z (2015) A sparse coding neural network ASIC with on-chip learning for feature extraction and encoding. IEEE J Solid-State Circuits 50(4):1070–1079CrossRef Knag P, Kim JK, Chen T, Zhang Z (2015) A sparse coding neural network ASIC with on-chip learning for feature extraction and encoding. IEEE J Solid-State Circuits 50(4):1070–1079CrossRef
81.
Zurück zum Zitat Bouris D, Nikitakis A, Papaefstathiou I (2010) Fast and efficient FPGA-based feature detection employing the SURF algorithm. Presented at the 2010 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines Bouris D, Nikitakis A, Papaefstathiou I (2010) Fast and efficient FPGA-based feature detection employing the SURF algorithm. Presented at the 2010 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines
82.
Zurück zum Zitat Ali U, Malik MB, Munawar K (2009) FPGA/soft-processor based real-time object tracking system. In: 5th Southern Conference on Programmable Logic (SPL) Ali U, Malik MB, Munawar K (2009) FPGA/soft-processor based real-time object tracking system. In: 5th Southern Conference on Programmable Logic (SPL)
83.
Zurück zum Zitat Liu S, Papakonstantinou A, Wang H, Chen D (2011) Real-time object tracking system on FPGAs. Presented at the 2011 Symposium on Application Accelerators in High-Performance Computing (SAAHPC 2011) Liu S, Papakonstantinou A, Wang H, Chen D (2011) Real-time object tracking system on FPGAs. Presented at the 2011 Symposium on Application Accelerators in High-Performance Computing (SAAHPC 2011)
84.
Zurück zum Zitat Kryjak T, Gorgon M (2013) Real-time implementation of the ViBe foreground object segmentation algorithm. In: Federated Conference on Computer Science and Information Systems (FedCSIS), pp 591–596 Kryjak T, Gorgon M (2013) Real-time implementation of the ViBe foreground object segmentation algorithm. In: Federated Conference on Computer Science and Information Systems (FedCSIS), pp 591–596
85.
Zurück zum Zitat Saqib F, Dutta A, Plusquellic J, Ortiz P, Pattichis MS (2015) Pipelined decision tree classification accelerator implementation in FPGA (DT-CAIF). IEEE Trans Comput 64(1):280–285MathSciNetCrossRef Saqib F, Dutta A, Plusquellic J, Ortiz P, Pattichis MS (2015) Pipelined decision tree classification accelerator implementation in FPGA (DT-CAIF). IEEE Trans Comput 64(1):280–285MathSciNetCrossRef
86.
Zurück zum Zitat Pan J, Lauterbach C, Manocha D (2010) g-Planner: real-time motion planning and global navigation using GPUs. In: Proceedings of AAAI Conference on Artificial Intelligence 1245–1251 Pan J, Lauterbach C, Manocha D (2010) g-Planner: real-time motion planning and global navigation using GPUs. In: Proceedings of AAAI Conference on Artificial Intelligence 1245–1251
87.
Zurück zum Zitat Vasumathi B, Moorthi S (2012) Implementation of hybrid ANN-PSO algorithm on FPGA for harmonic estimation. Eng Appl Artif Intell 25(3):476–483CrossRef Vasumathi B, Moorthi S (2012) Implementation of hybrid ANN-PSO algorithm on FPGA for harmonic estimation. Eng Appl Artif Intell 25(3):476–483CrossRef
88.
Zurück zum Zitat Appleyard J, Kocisky T, Blunsom P (2016) Optimizing performance of recurrent neural networks on gpus. arXiv preprint arXiv:1604.01946 Appleyard J, Kocisky T, Blunsom P (2016) Optimizing performance of recurrent neural networks on gpus. arXiv preprint arXiv:​1604.​01946
89.
Zurück zum Zitat Wang Y, Xu J, Han Y, Li H, Li X (2016) DeepBurning: automatic generation of FPGA-based learning accelerators for the neural network family, pp 1–6 Wang Y, Xu J, Han Y, Li H, Li X (2016) DeepBurning: automatic generation of FPGA-based learning accelerators for the neural network family, pp 1–6
90.
Zurück zum Zitat Sharma H, Park J, Amaro E, Thwaites B, Kotha P, Gupta A, Kim Joon K, Mishra A, Esmaeilzadeh H (2016) DNNWeaver: from high-level deep network models to FPGA acceleration. In: Workshop on Cognitive Architectures Sharma H, Park J, Amaro E, Thwaites B, Kotha P, Gupta A, Kim Joon K, Mishra A, Esmaeilzadeh H (2016) DNNWeaver: from high-level deep network models to FPGA acceleration. In: Workshop on Cognitive Architectures
91.
Zurück zum Zitat DiCecco R, Lacey G, Vasiljevic J, Chow P, Taylor G, Areibi S (2016) Caffeinated FPGAs: FPGA framework for convolutional neural networks. In: International Conference on Field-Programmable Technology (FPT) DiCecco R, Lacey G, Vasiljevic J, Chow P, Taylor G, Areibi S (2016) Caffeinated FPGAs: FPGA framework for convolutional neural networks. In: International Conference on Field-Programmable Technology (FPT)
92.
Zurück zum Zitat Umuroglu Y, Fraser NJ, Gambardella G, Blott M, Leong P, Jahre M, Vissers K (2017) FINN: a framework for fast, scalable binarized neural network inference. In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays—FPGA ’17 Umuroglu Y, Fraser NJ, Gambardella G, Blott M, Leong P, Jahre M, Vissers K (2017) FINN: a framework for fast, scalable binarized neural network inference. In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays—FPGA ’17
93.
Zurück zum Zitat Geng T, Wang T, Sanaullah A, Yang C, Patel R, Herbordt M (2018) A framework for acceleration of CNN training on deeply-pipelined FPGA clusters with work and weight load balancing. Presented at the 2018 28th International Conference on Field Programmable Logic and Applications (FPL) Geng T, Wang T, Sanaullah A, Yang C, Patel R, Herbordt M (2018) A framework for acceleration of CNN training on deeply-pipelined FPGA clusters with work and weight load balancing. Presented at the 2018 28th International Conference on Field Programmable Logic and Applications (FPL)
94.
Zurück zum Zitat Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the ACM International Conference on Multimedia—MM ’14 Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the ACM International Conference on Multimedia—MM ’14
95.
Zurück zum Zitat Venieris SI, Bouganis C-S (2016) FPAGConvNet: a framework for mapping convolutional neural networks on FPGAs. In: IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) Venieris SI, Bouganis C-S (2016) FPAGConvNet: a framework for mapping convolutional neural networks on FPGAs. In: IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)
96.
Zurück zum Zitat Samragh M, Ghasemzadeh M, Koushanfar F (2017) Customizing neural networks for efficient FPGA implementation. Presented at the 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) Samragh M, Ghasemzadeh M, Koushanfar F (2017) Customizing neural networks for efficient FPGA implementation. Presented at the 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)
97.
Zurück zum Zitat Liu Z, Dou Y, Jiang J, Xu J, Li S, Zhou Y, Xu Y (2017) Throughput-optimized FPGA accelerator for deep convolutional neural networks. ACM Trans Reconfig Technol Syst 10(3):1–23CrossRef Liu Z, Dou Y, Jiang J, Xu J, Li S, Zhou Y, Xu Y (2017) Throughput-optimized FPGA accelerator for deep convolutional neural networks. ACM Trans Reconfig Technol Syst 10(3):1–23CrossRef
98.
Zurück zum Zitat Guan Y, Liang H, Xu N, Wang W, Shi S, Chen X, Sun G, Zhang W, Cong J (2017) FP-DNN: an automated framework for mapping deep neural networks onto FPGAs with RTL-HLS hybrid templates. In: IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) Guan Y, Liang H, Xu N, Wang W, Shi S, Chen X, Sun G, Zhang W, Cong J (2017) FP-DNN: an automated framework for mapping deep neural networks onto FPGAs with RTL-HLS hybrid templates. In: IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)
99.
Zurück zum Zitat Wei X, Yu CH, Zhang P, Chen Y, Wang Y, Hu H, Cong J (2017) Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs. Presented at the 54th Annual Design Automation Conference 2017 Wei X, Yu CH, Zhang P, Chen Y, Wang Y, Hu H, Cong J (2017) Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs. Presented at the 54th Annual Design Automation Conference 2017
100.
Zurück zum Zitat Zhao R, Ng H-C, Luk W, Niu X (2018) Towards efficient convolutional neural network for domain-specific applications on FPGA. In: 28th International Conference on Field Programmable Logic and Applications (FPL) Zhao R, Ng H-C, Luk W, Niu X (2018) Towards efficient convolutional neural network for domain-specific applications on FPGA. In: 28th International Conference on Field Programmable Logic and Applications (FPL)
101.
Zurück zum Zitat Bottleson J, Kim S, Andrews J, Bindu P, Murthy DN, Jin J (2016) clCaffe: OpenCL accelerated Caffe for convolutional neural networks. In: IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Bottleson J, Kim S, Andrews J, Bindu P, Murthy DN, Jin J (2016) clCaffe: OpenCL accelerated Caffe for convolutional neural networks. In: IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
102.
Zurück zum Zitat Rabhi S, Sun W, Perez J, Kristensen MRB, Liu J, Oldridge E (2019) Accelerating recommender system training 15x with RAPIDS. In: Proceedings of the Workshop on ACM Recommender Systems Challenge. RecSys Challenge ’19: ACM Recommender Systems Challenge 2019 Workshop Rabhi S, Sun W, Perez J, Kristensen MRB, Liu J, Oldridge E (2019) Accelerating recommender system training 15x with RAPIDS. In: Proceedings of the Workshop on ACM Recommender Systems Challenge. RecSys Challenge ’19: ACM Recommender Systems Challenge 2019 Workshop
103.
Zurück zum Zitat Gong J, Shen H, Zhang G, Liu X, Li S, Jin G, Maheshwari N, Fomenko E, Segal E (2018) Highly efficient 8-bit low precision inference of convolutional neural networks with IntelCaffe. In: Proceedings of the 1st on Reproducible Quality-Efficient Systems Tournament on Co-designing Pareto-efficient Deep Learning (ReQuEST ’18). Association for Computing Machinery, New York, NY, USA, Article 2, 1 Gong J, Shen H, Zhang G, Liu X, Li S, Jin G, Maheshwari N, Fomenko E, Segal E (2018) Highly efficient 8-bit low precision inference of convolutional neural networks with IntelCaffe. In: Proceedings of the 1st on Reproducible Quality-Efficient Systems Tournament on Co-designing Pareto-efficient Deep Learning (ReQuEST ’18). Association for Computing Machinery, New York, NY, USA, Article 2, 1
104.
Zurück zum Zitat Abdelouahab K, Pelcat M, Serot J, Bourrasset C, Berry F (2017) Tactics to directly map CNN graphs on embedded FPGAs. IEEE Embed Syst Lett 9(4):113–116CrossRef Abdelouahab K, Pelcat M, Serot J, Bourrasset C, Berry F (2017) Tactics to directly map CNN graphs on embedded FPGAs. IEEE Embed Syst Lett 9(4):113–116CrossRef
105.
Zurück zum Zitat Sharma H et al (2016) From High-level deep neural models to FPGAs. In: 49th Annual IEEE/ACM International Symposium on Microarchitecture, pp 1–12 Sharma H et al (2016) From High-level deep neural models to FPGAs. In: 49th Annual IEEE/ACM International Symposium on Microarchitecture, pp 1–12
106.
Zurück zum Zitat Ma Y, Suda N, Cao Y, Vrudhula S, Seo JS (2018) ALAMO: FPGA acceleration of deep learning algorithms with a modularized RTL compiler. Integration 62:14–23CrossRef Ma Y, Suda N, Cao Y, Vrudhula S, Seo JS (2018) ALAMO: FPGA acceleration of deep learning algorithms with a modularized RTL compiler. Integration 62:14–23CrossRef
107.
Zurück zum Zitat Venieris SI (2017) Latency-driven design for FPGA-based convolutional neural networks Venieris SI (2017) Latency-driven design for FPGA-based convolutional neural networks
108.
Zurück zum Zitat Zeng H, Zhang C, Prasanna V (2018) Fast generation of high throughput customized deep learning accelerators on FPGAs. In: International Conference on Reconfigurable Computing FPGAs, ReConFig 2017, vol 2018-Janua, pp 1–8 Zeng H, Zhang C, Prasanna V (2018) Fast generation of high throughput customized deep learning accelerators on FPGAs. In: International Conference on Reconfigurable Computing FPGAs, ReConFig 2017, vol 2018-Janua, pp 1–8
109.
Zurück zum Zitat Venieris SI (2018) f-CNN x : a toolflow for mapping multiple convolutional neural networks on FPGAs Venieris SI (2018) f-CNN x : a toolflow for mapping multiple convolutional neural networks on FPGAs
111.
Zurück zum Zitat Ma Y, Suda N, Cao Y, Seo JS, Vrudhula S (2016) Scalable and modularized RTL compilation of convolutional neural networks onto FPGA. In: 26th International Conference on Field-Programmable Logic and Applications (FPL) Ma Y, Suda N, Cao Y, Seo JS, Vrudhula S (2016) Scalable and modularized RTL compilation of convolutional neural networks onto FPGA. In: 26th International Conference on Field-Programmable Logic and Applications (FPL)
112.
Zurück zum Zitat Cadambi S, Graf HP (2010) A programmable parallel accelerator for learning and classification, pp 273–283 Cadambi S, Graf HP (2010) A programmable parallel accelerator for learning and classification, pp 273–283
113.
Zurück zum Zitat Art P (2011) Artificial neural network acceleration on FPGA using custom instruction, pp 450–455 Art P (2011) Artificial neural network acceleration on FPGA using custom instruction, pp 450–455
114.
Zurück zum Zitat Luo G, Zhang C, Cong J, Sun J, Sun G, Wu D (2016) Energy-efficient CNN implementation on a deeply pipelined FPGA cluster, pp 326–331 Luo G, Zhang C, Cong J, Sun J, Sun G, Wu D (2016) Energy-efficient CNN implementation on a deeply pipelined FPGA cluster, pp 326–331
115.
Zurück zum Zitat Sun F et al (2018) A high-performance accelerator for large-scale convolutional neural networks. In: Proceedings of the 15th IEEE International Symposium on International Parallel and Distributed Processing with Application. 16th IEEE International Conference on Ubiquitous Computing and Communications, ISPA/IUCC 2017, pp 622–629 Sun F et al (2018) A high-performance accelerator for large-scale convolutional neural networks. In: Proceedings of the 15th IEEE International Symposium on International Parallel and Distributed Processing with Application. 16th IEEE International Conference on Ubiquitous Computing and Communications, ISPA/IUCC 2017, pp 622–629
116.
Zurück zum Zitat Qiao Y (2011) FPGA-accelerated deep convolutional neural networks for high throughput and energy efficiency. Seismol Res Lett 82(2):2010–2011 Qiao Y (2011) FPGA-accelerated deep convolutional neural networks for high throughput and energy efficiency. Seismol Res Lett 82(2):2010–2011
117.
Zurück zum Zitat Motamedi M, Gysel P, Akella V, Ghiasi S (2016) Design space exploration of FPGA-based deep convolutional neural networks. In: Proceeding of Asia and South Pacific Design Automation Conference, ASP-DAC, vol 25–28 Jan, pp 575–580 Motamedi M, Gysel P, Akella V, Ghiasi S (2016) Design space exploration of FPGA-based deep convolutional neural networks. In: Proceeding of Asia and South Pacific Design Automation Conference, ASP-DAC, vol 25–28 Jan, pp 575–580
118.
Zurück zum Zitat Rahman A, Lee J, Choi K (2016) Efficient FPGA acceleration of convolutional neural networks using logical-3D compute array, pp 1393–1398 Rahman A, Lee J, Choi K (2016) Efficient FPGA acceleration of convolutional neural networks using logical-3D compute array, pp 1393–1398
119.
Zurück zum Zitat Zhang J, Li J (2017) Improving the performance of OpenCL-based FPGA accelerator for convolutional neural network, pp 25–34 Zhang J, Li J (2017) Improving the performance of OpenCL-based FPGA accelerator for convolutional neural network, pp 25–34
120.
Zurück zum Zitat Yonekawa H, Nakahara H (2017) On-chip memory based binarized convolutional deep neural network applying batch normalization free technique on an FPGA. In: Proceedings of IEEE 31st International Parallel and Distributed Processing Symposium Work, IPDPSW, pp 98–105 Yonekawa H, Nakahara H (2017) On-chip memory based binarized convolutional deep neural network applying batch normalization free technique on an FPGA. In: Proceedings of IEEE 31st International Parallel and Distributed Processing Symposium Work, IPDPSW, pp 98–105
121.
Zurück zum Zitat Nakahara H, Fujii T, Sato S (2017) A fully connected layer elimination for a binarizec convolutional neural network on an FPGA. In: 27th International Conference on Field-Programmable Logic and Applications (FPL), pp 1–4 Nakahara H, Fujii T, Sato S (2017) A fully connected layer elimination for a binarizec convolutional neural network on an FPGA. In: 27th International Conference on Field-Programmable Logic and Applications (FPL), pp 1–4
122.
Zurück zum Zitat Kim L (2017) DeepX: deep learning accelerator for restricted Boltzmann machine artificial neural networks, pp 1–13 Kim L (2017) DeepX: deep learning accelerator for restricted Boltzmann machine artificial neural networks, pp 1–13
123.
Zurück zum Zitat Zhao R et al (2017) Accelerating binarized convolutional neural networks with software-programmable FPGAs, pp 15–24 Zhao R et al (2017) Accelerating binarized convolutional neural networks with software-programmable FPGAs, pp 15–24
124.
Zurück zum Zitat Aydonat U, O’Connell S, Capalija D, Ling AC, Chiu GR (2017) An OpenCL(TM) deep learning accelerator on Arria 10, pp 55–64 Aydonat U, O’Connell S, Capalija D, Ling AC, Chiu GR (2017) An OpenCL(TM) deep learning accelerator on Arria 10, pp 55–64
125.
Zurück zum Zitat Shimoda M, Sato S, Nakahara H (2018) All binarized convolutional neural network and its implementation on an FPGA. In: International Conference on Field-Programmable Technology, ICFPT, vol 2018-Janua, pp 291–294 Shimoda M, Sato S, Nakahara H (2018) All binarized convolutional neural network and its implementation on an FPGA. In: International Conference on Field-Programmable Technology, ICFPT, vol 2018-Janua, pp 291–294
126.
Zurück zum Zitat Xian A, Chang M, Culurciello E (2017) Hardware accelerators for recurrent neural networks on FPGA, pp 0–3 Xian A, Chang M, Culurciello E (2017) Hardware accelerators for recurrent neural networks on FPGA, pp 0–3
127.
Zurück zum Zitat Guo J, Yin S, Ouyang P, Liu L, Wei S (2017) Bit-width based resource partitioning for CNN acceleration on FPGA. In: Proceedings of IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines. FCCM 2017, p 31 Guo J, Yin S, Ouyang P, Liu L, Wei S (2017) Bit-width based resource partitioning for CNN acceleration on FPGA. In: Proceedings of IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines. FCCM 2017, p 31
128.
Zurück zum Zitat Zhang C, Prasanna V (2017) Frequency domain acceleration of convolutional neural networks on CPU-FPGA shared memory system, pp 35–44 Zhang C, Prasanna V (2017) Frequency domain acceleration of convolutional neural networks on CPU-FPGA shared memory system, pp 35–44
129.
Zurück zum Zitat Yan S, Lu L, Liang Y, Xiao Q, Tai Y-W (2017) Exploring heterogeneous algorithms for accelerating deep convolutional neural networks on FPGAs, pp 1–6 Yan S, Lu L, Liang Y, Xiao Q, Tai Y-W (2017) Exploring heterogeneous algorithms for accelerating deep convolutional neural networks on FPGAs, pp 1–6
130.
Zurück zum Zitat Gong L, Wang C, Li X, Chen X, Zhou X (2017) Work-in-progress: a power-efficient and high performance FPGA accelerator for convolutional neural networks Gong L, Wang C, Li X, Chen X, Zhou X (2017) Work-in-progress: a power-efficient and high performance FPGA accelerator for convolutional neural networks
131.
Zurück zum Zitat Ma Y, Cao Y, Vrudhula S, Seo J (2017) Optimizing loop operation and dataflow in FPGA acceleration of deep convolutional neural networks, pp 45–54 Ma Y, Cao Y, Vrudhula S, Seo J (2017) Optimizing loop operation and dataflow in FPGA acceleration of deep convolutional neural networks, pp 45–54
132.
Zurück zum Zitat Nguyen D, Kim D, Lee J (2017) Double MAC: doubling the performance of convolutional neural networks on modern FPGAs. In: Proceedings of 2017 Design, Automation and Test in Europe Conference and Exhibition, pp 890–893 Nguyen D, Kim D, Lee J (2017) Double MAC: doubling the performance of convolutional neural networks on modern FPGAs. In: Proceedings of 2017 Design, Automation and Test in Europe Conference and Exhibition, pp 890–893
133.
Zurück zum Zitat Hwang WJ, Jhang YJ, Tai TM (2017) An efficient FPGA-based architecture for convolutional neural networks. In: 40th International Conference on Telecommunications and Signal Processing, TSP, vol 2017-Janua, pp 582–588 Hwang WJ, Jhang YJ, Tai TM (2017) An efficient FPGA-based architecture for convolutional neural networks. In: 40th International Conference on Telecommunications and Signal Processing, TSP, vol 2017-Janua, pp 582–588
134.
Zurück zum Zitat Ma Y, Cao Y, Vrudhula S, Seo JS (2018) Optimizing the convolution operation to accelerate deep neural networks on FPGA. IEEE Trans Very Large Scale Integr Syst 26(7):1354–1367CrossRef Ma Y, Cao Y, Vrudhula S, Seo JS (2018) Optimizing the convolution operation to accelerate deep neural networks on FPGA. IEEE Trans Very Large Scale Integr Syst 26(7):1354–1367CrossRef
135.
Zurück zum Zitat Guan Y, Yuan Z, Sun G, Cong J (2017) FPGA-based accelerator for long short-term memory recurrent neural networks. In: Proceedings of Asia and South Pacific Design Automation Conference, ASP-DAC, pp 629–634 Guan Y, Yuan Z, Sun G, Cong J (2017) FPGA-based accelerator for long short-term memory recurrent neural networks. In: Proceedings of Asia and South Pacific Design Automation Conference, ASP-DAC, pp 629–634
136.
Zurück zum Zitat Ma Y, Kim M, Cao Y, Vrudhula S, Seo JS (2017) End-to-end scalable FPGA accelerator for deep residual networks. In: Proceedings of IEEE International Symposium on Circuits and Systems, pp 0–3 Ma Y, Kim M, Cao Y, Vrudhula S, Seo JS (2017) End-to-end scalable FPGA accelerator for deep residual networks. In: Proceedings of IEEE International Symposium on Circuits and Systems, pp 0–3
137.
Zurück zum Zitat Yu J et al (2018) Instruction driven cross-layer CNN accelerator with winograd transformation on FPGA. In: International Conference on Field-Programmable Technology, ICFPT 2017, vol 2018-Janua, pp 227–230 Yu J et al (2018) Instruction driven cross-layer CNN accelerator with winograd transformation on FPGA. In: International Conference on Field-Programmable Technology, ICFPT 2017, vol 2018-Janua, pp 227–230
138.
Zurück zum Zitat Kim JH, Grady B, Lian B, Brothers J, Anderson JH (2017) FPGA-based CNN inference accelerator synthesized from multi-threaded C software, pp 268–273 Kim JH, Grady B, Lian B, Brothers J, Anderson JH (2017) FPGA-based CNN inference accelerator synthesized from multi-threaded C software, pp 268–273
139.
Zurück zum Zitat Moss DJM et al (2017) High performance binary neural networks on the Xeon+FPGATM platform. In: 27th International Conference on Field-Programmable Logic and Applications (FPL) Moss DJM et al (2017) High performance binary neural networks on the Xeon+FPGATM platform. In: 27th International Conference on Field-Programmable Logic and Applications (FPL)
140.
Zurück zum Zitat Guo K et al (2018) Angel-Eye: a complete design flow for mapping CNN onto embedded FPGA. IEEE Trans Comput Des Integr Circuits Syst 37(1):35–47CrossRef Guo K et al (2018) Angel-Eye: a complete design flow for mapping CNN onto embedded FPGA. IEEE Trans Comput Des Integr Circuits Syst 37(1):35–47CrossRef
141.
Zurück zum Zitat Gong L, Wang C, Li X, Chen H, Zhou X (2018) MALOC: a fully pipelined FPGA accelerator for convolutional neural networks with all layers mapped on chip. IEEE Trans Comput Des Integr Circuits Syst 37(11):2601–2612CrossRef Gong L, Wang C, Li X, Chen H, Zhou X (2018) MALOC: a fully pipelined FPGA accelerator for convolutional neural networks with all layers mapped on chip. IEEE Trans Comput Des Integr Circuits Syst 37(11):2601–2612CrossRef
142.
Zurück zum Zitat Duarte RP (2018) Lite-CNN: a high-performance architecture to execute CNNs in low density FPGAs Duarte RP (2018) Lite-CNN: a high-performance architecture to execute CNNs in low density FPGAs
143.
Zurück zum Zitat Rybalkin V, Pappalardo A, Ghaffar MM, Gambardella G, Wehn N, Blott M (2018) FINN-L: Library extensions and design trade-off analysis for variable precision LSTM networks on FPGAs. In: Proceedings of 2018 International Conference on Field-Programmable Logic and Applications (FPL), pp 89–96 Rybalkin V, Pappalardo A, Ghaffar MM, Gambardella G, Wehn N, Blott M (2018) FINN-L: Library extensions and design trade-off analysis for variable precision LSTM networks on FPGAs. In: Proceedings of 2018 International Conference on Field-Programmable Logic and Applications (FPL), pp 89–96
144.
Zurück zum Zitat Yu Q, Wang C, Ma X, Li X, Zhou X, (2015) A deep learning prediction process accelerator based FPGA. In: Proceedings of 2015 IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid 2015, no 500, pp 1159–1162 Yu Q, Wang C, Ma X, Li X, Zhou X, (2015) A deep learning prediction process accelerator based FPGA. In: Proceedings of 2015 IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid 2015, no 500, pp 1159–1162
145.
Zurück zum Zitat Abdelfattah MS et al (2018) DLA: compiler and FPGA overlay for neural network inference acceleration Abdelfattah MS et al (2018) DLA: compiler and FPGA overlay for neural network inference acceleration
146.
Zurück zum Zitat Nurvitadhi E et al (2018) In-package domain-specific ASICs for Intel® Stratix® 10 FPGAs: a case study of accelerating deep learning using TensorTile ASIC, pp 106–110 Nurvitadhi E et al (2018) In-package domain-specific ASICs for Intel® Stratix® 10 FPGAs: a case study of accelerating deep learning using TensorTile ASIC, pp 106–110
147.
Zurück zum Zitat Zhang C (2015) Optimizing FPGA-based accelerator design for deep convolutional neural networks, pp 161–170 Zhang C (2015) Optimizing FPGA-based accelerator design for deep convolutional neural networks, pp 161–170
148.
Zurück zum Zitat Qiu J et al (2016) Going deeper with embedded FPGA platform for convolutional neural network, pp 26–35 Qiu J et al (2016) Going deeper with embedded FPGA platform for convolutional neural network, pp 26–35
149.
Zurück zum Zitat Vrudhula S et al (2016) Throughput-optimized OpenCL-based FPGA accelerator for large-scale convolutional neural networks, pp 16–25 Vrudhula S et al (2016) Throughput-optimized OpenCL-based FPGA accelerator for large-scale convolutional neural networks, pp 16–25
150.
Zurück zum Zitat Wang Y et al (2016) Low power convolutional neural networks on a chip. In: Proceedings of IEEE International Symposium on Circuits and Systems, vol 2016-July, no 1, pp 129–132 Wang Y et al (2016) Low power convolutional neural networks on a chip. In: Proceedings of IEEE International Symposium on Circuits and Systems, vol 2016-July, no 1, pp 129–132
151.
Zurück zum Zitat Feng G, Hu Z, Chen S, Wu F (2016) Energy-efficient and high-throughput FPGA-based accelerator for convolutional neural networks, pp 4–6 Feng G, Hu Z, Chen S, Wu F (2016) Energy-efficient and high-throughput FPGA-based accelerator for convolutional neural networks, pp 4–6
152.
Zurück zum Zitat Wang C, Gong L, Yu Q, Li X, Xie Y, Zhou X (2017) DLAU: a scalable deep learning accelerator unit on FPGA. IEEE Trans Comput Des Integr Circuits Syst 36(3):513–517 Wang C, Gong L, Yu Q, Li X, Xie Y, Zhou X (2017) DLAU: a scalable deep learning accelerator unit on FPGA. IEEE Trans Comput Des Integr Circuits Syst 36(3):513–517
153.
Zurück zum Zitat Park J, Lotfi-Kamran P, Sharma H, Esmaeilzadeh H, Yazdanbakhsh A (2016) Neural acceleration for GPU throughput processors, pp 482–493 Park J, Lotfi-Kamran P, Sharma H, Esmaeilzadeh H, Yazdanbakhsh A (2016) Neural acceleration for GPU throughput processors, pp 482–493
154.
Zurück zum Zitat Strigl D, Kofler K, Podlipnig S (2010) Performance and scalability of GPU-based convolutional neural networks. In: Proceedings of the 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, PDP 2010, pp 317–324 Strigl D, Kofler K, Podlipnig S (2010) Performance and scalability of GPU-based convolutional neural networks. In: Proceedings of the 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, PDP 2010, pp 317–324
155.
Zurück zum Zitat Guzhva A, Dolenko S, Persiantsev I (2009) Multifold acceleration of neural network computations using GPU. In: Artificial Neural Networks—ICANN 2009, pp 373–380 Guzhva A, Dolenko S, Persiantsev I (2009) Multifold acceleration of neural network computations using GPU. In: Artificial Neural Networks—ICANN 2009, pp 373–380
156.
Zurück zum Zitat Li B, Zhou E, Huang B, Duan J, Wang Y, Xu N, Zhang J, Yang H (2014) Large scale recurrent neural network on GPU. In: International Joint Conference on Neural Networks (IJCNN) Li B, Zhou E, Huang B, Duan J, Wang Y, Xu N, Zhang J, Yang H (2014) Large scale recurrent neural network on GPU. In: International Joint Conference on Neural Networks (IJCNN)
157.
Zurück zum Zitat Kim Y, Lee J, Kim J-S, Jei H, Roh H (2018) Efficient multi-GPU memory management for deep learning acceleration. In: IEEE 3rd International Workshops on Foundations and Applications of Self* Systems (FAS*W) Kim Y, Lee J, Kim J-S, Jei H, Roh H (2018) Efficient multi-GPU memory management for deep learning acceleration. In: IEEE 3rd International Workshops on Foundations and Applications of Self* Systems (FAS*W)
158.
Zurück zum Zitat Bhuiyan MA, Pallipuram VK, Smith MC (2010) Acceleration of spiking neural networks in emerging multi-core and GPU architectures. In: IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum (IPDPSW) Bhuiyan MA, Pallipuram VK, Smith MC (2010) Acceleration of spiking neural networks in emerging multi-core and GPU architectures. In: IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum (IPDPSW)
159.
Zurück zum Zitat Zhang X, Gu N, Ye H (2016) Multi-GPU based recurrent neural networks language model training. In: Communications in computer and information science, pp 484–493 Zhang X, Gu N, Ye H (2016) Multi-GPU based recurrent neural networks language model training. In: Communications in computer and information science, pp 484–493
160.
Zurück zum Zitat Potluri S, Fasih A, Vutukuru LK, Machot FA, Kyamakya K (2011) CNN based high performance computing for real time image processing on GPU. Presented at the 16th Int’l Symposium on Theoretical Electrical Engineering (ISTET) Potluri S, Fasih A, Vutukuru LK, Machot FA, Kyamakya K (2011) CNN based high performance computing for real time image processing on GPU. Presented at the 16th Int’l Symposium on Theoretical Electrical Engineering (ISTET)
161.
Zurück zum Zitat Farah NICLA (2014) A new classification approach for neural networks hardware: from standards chips to embedded systems on chip, pp 491–534 Farah NICLA (2014) A new classification approach for neural networks hardware: from standards chips to embedded systems on chip, pp 491–534
162.
Zurück zum Zitat Jin L, Wang Z, Gu R, Yuan C, Huang Y (2014) Training large scale deep neural networks on the Intel Xeon Phi many-core coprocessor. In: IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Jin L, Wang Z, Gu R, Yuan C, Huang Y (2014) Training large scale deep neural networks on the Intel Xeon Phi many-core coprocessor. In: IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
163.
Zurück zum Zitat Kurth T, Zhang J, Satish N, Racah E, Mitliagkas I, Patwary MMA, Malas T, Sundaram N, Bhimji W, Smorkalov M et al (2017) Deep learning at 15PF: supervised and semi-supervised classification for scientific data. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, pp 7 Kurth T, Zhang J, Satish N, Racah E, Mitliagkas I, Patwary MMA, Malas T, Sundaram N, Bhimji W, Smorkalov M et al (2017) Deep learning at 15PF: supervised and semi-supervised classification for scientific data. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, pp 7
164.
Zurück zum Zitat Georganas E, Avancha S, Banerjee K, Kalamkar D, Henry G, Pabst H, Heinecke A (2018) Anatomy of high-performance deep learning convolutions on SIMD architectures. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, SC ’18, Piscataway, NJ, USA. IEEE Press, pp 66:1–66:12 Georganas E, Avancha S, Banerjee K, Kalamkar D, Henry G, Pabst H, Heinecke A (2018) Anatomy of high-performance deep learning convolutions on SIMD architectures. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, SC ’18, Piscataway, NJ, USA. IEEE Press, pp 66:1–66:12
165.
Zurück zum Zitat Viebke A, Memeti S, Pllana S, Abraham A (2017) CHAOS: a parallelization scheme for training convolutional neural networks on Intel Xeon Phi. J Supercomput 75(1):197–227CrossRef Viebke A, Memeti S, Pllana S, Abraham A (2017) CHAOS: a parallelization scheme for training convolutional neural networks on Intel Xeon Phi. J Supercomput 75(1):197–227CrossRef
166.
Zurück zum Zitat Mathuriya A, Bard D, Mendygral P, Meadows L, Arnemann J, Shao L, He S, Karna T, Moise D, Pennycook SJ, Maschhoff K, Sewall J, Kumar N, Ho S, Ringenburg MF, Prabhat P, Lee V (2018) CosmoFlow: using deep learning to learn the universe at scale. In: SC18: International Conference for High Performance Computing, Networking, Storage and Analysis Mathuriya A, Bard D, Mendygral P, Meadows L, Arnemann J, Shao L, He S, Karna T, Moise D, Pennycook SJ, Maschhoff K, Sewall J, Kumar N, Ho S, Ringenburg MF, Prabhat P, Lee V (2018) CosmoFlow: using deep learning to learn the universe at scale. In: SC18: International Conference for High Performance Computing, Networking, Storage and Analysis
167.
Zurück zum Zitat Hu Y, Zhai J, Li D, Gong Y, Zhu Y, Liu W, Su L, Jin J (2018) BitFlow: exploiting vector parallelism for binary neural networks on CPU. Presented at the 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Hu Y, Zhai J, Li D, Gong Y, Zhu Y, Liu W, Su L, Jin J (2018) BitFlow: exploiting vector parallelism for binary neural networks on CPU. Presented at the 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
171.
Zurück zum Zitat Chen Y et al (2014) DaDianNao: a machine-learning supercomputer Chen Y et al (2014) DaDianNao: a machine-learning supercomputer
Metadaten
Titel
A systematic literature review on hardware implementation of artificial intelligence algorithms
verfasst von
Manar Abu Talib
Sohaib Majzoub
Qassim Nasir
Dina Jamal
Publikationsdatum
28.05.2020
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 2/2021
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-020-03325-8

Weitere Artikel der Ausgabe 2/2021

The Journal of Supercomputing 2/2021 Zur Ausgabe