research-article

Exploring Heterogeneous Algorithms for Accelerating Deep Convolutional Neural Networks on FPGAs

Authors:
Qingcheng Xiao

Center for Energy-efficient Computing and Applications, EECS, Peking University

Center for Energy-efficient Computing and Applications, EECS, Peking University
View Profile

,
Yun Liang

Center for Energy-efficient Computing and Applications, EECS, Peking University

Center for Energy-efficient Computing and Applications, EECS, Peking University
View Profile

,
Liqiang Lu

Center for Energy-efficient Computing and Applications, EECS, Peking University

Center for Energy-efficient Computing and Applications, EECS, Peking University
View Profile

,
Shengen Yan

Department of Information Engineering, Chinese University of Hong Kong and SenseTime Group Limited

Department of Information Engineering, Chinese University of Hong Kong and SenseTime Group Limited
View Profile

,
Yu-Wing Tai

SenseTime Group Limited

SenseTime Group Limited
View Profile

DAC '17: Proceedings of the 54th Annual Design Automation Conference 2017June 2017Article No.: 62Pages 1–6https://doi.org/10.1145/3061639.3062244

Published:18 June 2017Publication History

DAC '17: Proceedings of the 54th Annual Design Automation Conference 2017

Pages 1–6

ABSTRACT

Convolutional neural network (CNN) finds applications in a variety of computer vision applications ranging from object recognition and detection to scene understanding owing to its exceptional accuracy. There exist different algorithms for CNNs computation. In this paper, we explore conventional convolution algorithm with a faster algorithm using Winograd's minimal filtering theory for efficient FPGA implementation. Distinct from the conventional convolution algorithm, Winograd algorithm uses less computing resources but puts more pressure on the memory bandwidth. We first propose a fusion architecture that can fuse multiple layers naturally in CNNs, reusing the intermediate data. Based on this fusion architecture, we explore heterogeneous algorithms to maximize the throughput of a CNN. We design an optimal algorithm to determine the fusion and algorithm strategy for each layer. We also develop an automated toolchain to ease the mapping from Caffe model to FPGA bitstream using Vivado HLS. Experiments using widely used VGG and AlexNet demonstrate that our design achieves up to 1.99X performance speedup compared to the prior fusion-based FPGA accelerator for CNNs.

References

M. Alwani, H. Chen, M. Ferdman, and P. Milder. Fused-layer cnn accelerators. In MICRO, 2016.Google ScholarCross Ref
S. Cadambi and et al. A programmable parallel accelerator for learning and classification. In PACT, 2010. Google ScholarDigital Library
S. Chakradhar and et al. A dynamically configurable coprocessor for convolutional neural networks. In ISCA, 2010. Google ScholarDigital Library
T. Chen and et al. Diannao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. In ASPLOS, 2014. Google ScholarDigital Library
Y.-H. Chen, J. Emer, and V. Sze. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. ISCA, 2016. Google ScholarDigital Library
J. Cong and et al. High-level synthesis for fpgas: From prototyping to deployment. TCAD, 2011. Google ScholarDigital Library
J. Cong and B. Xiao. Minimizing computation in convolutional neural networks. In ICANN, 2014.Google ScholarCross Ref
Y. Guo, A. Yao, and Y. Chen. Dynamic network surgery for efficient dnns. In NIPS, 2016.Google ScholarDigital Library
S. Han and et al. Eie: efficient inference engine on compressed deep neural network. In ISCA, 2016. Google ScholarDigital Library
S. Han, H. Mao, and W. J. Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv, 2015.Google Scholar
S. Ji, W. Xu, M. Yang, and K. Yu. 3D convolutional neural networks for human action recognition. TPAMI, 2013. Google ScholarDigital Library
Y. Jia and et al. Caffe: Convolutional architecture for fast feature embedding. In MM, 2014. Google ScholarDigital Library
A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012. Google ScholarDigital Library
A. Lavin. Fast algorithms for convolutional neural networks. arXiv, 2015.Google Scholar
Y. Liang and et al. High-level synthesis: productivity, performance, and software constraints. IJECE, 2012. Google ScholarDigital Library
B. Liu and et al. Sparse convolutional neural networks. In CVPR, 2015.Google Scholar
L. Lu, Y. Liang, Q. Xiao, and S. Yan. Evaluating fast algorithms for convolutional neural networks on fpgas. In FCCM, 2017.Google ScholarCross Ref
J. Qiu and et al. Going deeper with embedded FPGA platform for convolutional neural network. In FPGA, 2016. Google ScholarDigital Library
H. Sharma and et al. Dnnweaver: From high-level deep network models to fpga acceleration. In CogArch, 2016.Google Scholar
K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv, 2014.Google Scholar
L. Song and et al. C-Brain: a deep learning accelerator that tames the diversity of CNNs through adaptive data-level parallelization. In DAC, 2016. Google ScholarDigital Library
C. Szegedy and et al. Going deeper with convolutions. In CVPR, 2015.Google ScholarCross Ref
Y. Wang, J. Xu, Y. Han, H. Li, and X. Li. DeepBurning: automatic generation of FPGA-based learning accelerators for the neural network family. In DAC, 2016. Google ScholarDigital Library
W. Wen and et al. Learning structured sparsity in deep neural networks. In NIPS, 2016.Google Scholar
S. Williams, A. Waterman, and D. Patterson. Roofline: an insightful visual performance model for multicore architectures. CACM, 2009. Google ScholarDigital Library
S. Winograd. Arithmetic complexity of computations. Siam, 1980.Google Scholar
C. Zhang and et al. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In FPGA, 2015. Google ScholarDigital Library

Recommendations

Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs
FPGA '17: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Convolutional neural networks (CNN) are the current stateof-the-art for many computer vision tasks. CNNs outperform older methods in accuracy, but require vast amounts of computation and memory. As a result, existing CNN applications are typically run ...
Read More
Can FPGAs Beat GPUs in Accelerating Next-Generation Deep Neural Networks?
FPGA '17: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Current-generation Deep Neural Networks (DNNs), such as AlexNet and VGG, rely heavily on dense floating-point matrix multiplication (GEMM), which maps well to GPUs (regular parallelism, high TFLOP/s). Because of this, GPUs are widely used for ...
Read More
Accelerating Large-Scale Deep Convolutional Neural Networks on Multi-core Vector Accelerators
Network and Parallel Computing
Abstract
This paper proposes an efficient algorithm mapping method for accelerating deep convolutional neural networks, which includes: (1) Proposing an efficient transformation method, which converts CNN’s convolutional layer and fully connected layer ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

DAC '17: Proceedings of the 54th Annual Design Automation Conference 2017
June 2017
533 pages
ISBN:9781450349277
DOI:10.1145/3061639

Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 June 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate1,770of5,499submissions,32%
Upcoming Conference
DAC '24

Sponsor:

sigda

61st ACM/IEEE Design Automation Conference

June 23 - 27, 2024

San Francisco , CA , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 135
  Total Citations
  View Citations
- 1,358
  Total Downloads
- Downloads (Last 12 months)97
- Downloads (Last 6 weeks)12
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Exploring Heterogeneous Algorithms for Accelerating Deep Convolutional Neural Networks on FPGAs

DAC '17: Proceedings of the 54th Annual Design Automation Conference 2017

ABSTRACT

References

Cited By

Recommendations

Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs

Can FPGAs Beat GPUs in Accelerating Next-Generation Deep Neural Networks?

Accelerating Large-Scale Deep Convolutional Neural Networks on Multi-core Vector Accelerators

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Exploring Heterogeneous Algorithms for Accelerating Deep Convolutional Neural Networks on FPGAs

DAC '17: Proceedings of the 54th Annual Design Automation Conference 2017

ABSTRACT

References

Cited By

Recommendations

Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs

Can FPGAs Beat GPUs in Accelerating Next-Generation Deep Neural Networks?

Accelerating Large-Scale Deep Convolutional Neural Networks on Multi-core Vector Accelerators

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media