An FPGA-Based Reconfigurable Convolutional Neural Network Accelerator for Tiny YOLO-V3

verfasst von: Tsung-Han Tsai, Nai-Chieh Tung, Chun-Yu Chen

Erschienen in: Circuits, Systems, and Signal Processing


In recent years, the development of deep learning has progressed rapidly, leading to broader applications of neural networks across various domains. These applications are becoming increasingly integrated into our daily lives, such as in mobile phone unlocking, intelligent customer service, and chatbots. However, the large number of parameters and operations required by neural networks presents a challenge, necessitating execution on GPUs or embedded development boards with CUDA acceleration. This has made the design of efficient hardware architectures to accelerate neural networks a critical research focus. In this paper, we propose a reconfigurable System on a Chip (SoC) hardware architecture capable of handling various input image sizes, kernel sizes, and stride sizes. The design includes support for 1 × 1 and 3 × 3 convolutions, batch normalization, activation functions, and max-pooling operations to accelerate neural network computations. To facilitate communication between the Programmable Logic (PL) and Processing System (PS) sides, we utilize the AXI bus protocol. The PS side is responsible for data transfer and sequencing, while the PL side performs all computational tasks. To address the on-chip memory limitations and reduce the number of data transfers and communications, we implement zero-padding directly in hardware. The proposed system implements the Tiny YOLO-V3 network on the Xilinx ZCU104 platform. Experimental results demonstrate that the system achieves a throughput of 42.496 GOPs and an energy efficiency of 8.57 GOPs/W.

