ABSTRACT
Recent advances in Neural Networks (NN) are enabling more and more innovative applications. As an energy-efficient hardware solution, machine learning accelerators for CNNs or traditional ANNs are also gaining popularity in the area of embedded vision, robotics and cyberphysics. However, the design parameters of NN models vary significantly from application to application. Hence, it's hard to provide one general and highly-efficient hardware solution to accommodate all of them, and it is also impractical for the domain-specific developers to customize their flown hardware targeting on a specific NN model. To deal with this dilemma, this study proposes a design automation tool, DeepBurning, allowing the application developers to build from scratch learning accelerators that targets their specific NN models with custom configurations and optimized performance. DeepBurning includes a RTL-level accelerator generator and a coordinated compiler that generates the control flow and data layout under the user-specified constraints. The results can be used to implement FPGA-based NN accelerator or help generate chip design for early design stage. In general, DeepBurning supports a large family of NN models, and greatly simplifies the design flow of NN accelerators for the machine learning or AI application developers. The evaluation shows that the generated learning accelerators burnt to our FPGA board exhibit great power efficiency compared to state-of-the-art FPGA-based solutions.
- Esmaeilzadeh et al., "Neural acceleration for general-purposeapproximate programs,". In Proc. MICRO, 2012. Google ScholarDigital Library
- G. Hinton et al., "Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups," in IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 82--97, 2012.Google ScholarCross Ref
- Y Jia et al., "Caffe: Convolutional architecture for fast feature embedding," in ACM Proc. Multimedia, 2014. Google ScholarDigital Library
- T. Chen et al., "DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning," in Proc. ASPLOS, 2014. Google ScholarDigital Library
- A. Krizhevsky et al., "Imagenet classification with deep convolutional neural networks," in Proc. NIPS, 2012.Google ScholarDigital Library
- M. Peemen et al., "Memory-centric accelerator design for convolutional neural networks," in Proc. ICCD, 2013.Google Scholar
- C. Zhang et al., "Optimizing fpga-based accelerator design for deep convolutional neural networks," in Proc. FPGA, 2015. Google ScholarDigital Library
- W. Ouyang et al,. "DeepID-Net: Deformable Deep Convolutional Neural Networks for Object Detection," in Proc. CVPR, 2015.Google Scholar
- Y. Pan et. al., "Jointly Modeling Embedding and Translation to Bridge Video and Language," in arXiv, 2015.Google Scholar
- V. Vapnik, "Does Deep Learning Come from the Devil?," Yandex conference on machine learning prospects and applications, Berlin, 2015Google Scholar
- R. Beigel et al., "Sorting n Objects With a k-Sorter," IEEE Trans, on Computers, 1990. Google ScholarDigital Library
Index Terms
- DeepBurning: automatic generation of FPGA-based learning accelerators for the neural network family
Recommendations
DeepBurning-GL: an automated framework for generating graph neural network accelerators
ICCAD '20: Proceedings of the 39th International Conference on Computer-Aided DesignBuilding FPGA-based graph learning accelerators is very time-consuming due to the low-level RTL programming and the complicated design flow of FPGA development. It also requires the architecture and hardware expertise from the Graph Neural Network (GNN) ...
DeepBurning-SEG: Generating DNN Accelerators of Segment-Grained Pipeline Architecture
MICRO '22: Proceedings of the 55th Annual IEEE/ACM International Symposium on MicroarchitectureThe growing complexity and diversity of deep neural network (DNN) applications have inspired intensive research on specialized DNN accelerators and also the design automation frameworks. Previous specialized NN acceleratos roughly fall into two ...
Synthesizable Standard Cell FPGA Fabrics Targetable by the Verilog-to-Routing CAD Flow
Special Section on Field Programmable Logic and Applications 2015 and Regular PapersIn this article, we consider implementing field-programmable gate arrays (FPGAs) using a standard cell design methodology and present a framework for the automated generation of synthesizable FPGA fabrics. The open-source Verilog-to-Routing (VTR) FPGA ...
Comments