research-article

Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks

Authors:
Chen Zhang

Center for Energy-Efficient Computing and Applications, Peking University, Beijing, China

Center for Energy-Efficient Computing and Applications, Peking University, Beijing, China
View Profile

,
Peng Li

Computer Science Department, University of California, Los Angeles, Los Angeles, USA

Computer Science Department, University of California, Los Angeles, Los Angeles, USA
View Profile

,
Guangyu Sun

Center for Energy-Efficient Computing and Applications, Peking University & PKU/UCLA Joint Research Institute in Science and Engineering, Beijing, China

Center for Energy-Efficient Computing and Applications, Peking University & PKU/UCLA Joint Research Institute in Science and Engineering, Beijing, China
View Profile

,
Yijin Guan

Center for Energy-Efficient Computing and Applications, Peking University, Beijing, China

Center for Energy-Efficient Computing and Applications, Peking University, Beijing, China
View Profile

,
Bingjun Xiao

Computer Science Department, University of California, Los Angeles, Los Angeles, USA

Computer Science Department, University of California, Los Angeles, Los Angeles, USA
View Profile

,
Jason Cong

Computer Science Department, University of California, Los Angeles, Los Angeles, USA & Center for Energy-Efficient Computing and Applications, Peking University & PKU/UCLA Joint Research Institute in Science and Engineering, Beijing, China

Computer Science Department, University of California, Los Angeles, Los Angeles, USA & Center for Energy-Efficient Computing and Applications, Peking University & PKU/UCLA Joint Research Institute in Science and Engineering, Beijing, China
View Profile

FPGA '15: Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysFebruary 2015Pages 161–170https://doi.org/10.1145/2684746.2689060

Published:22 February 2015Publication History

FPGA '15: Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Pages 161–170

ABSTRACT

Convolutional neural network (CNN) has been widely employed for image recognition because it can achieve high accuracy by emulating behavior of optic nerves in living creatures. Recently, rapid growth of modern applications based on deep learning algorithms has further improved research and implementations. Especially, various accelerators for deep CNN have been proposed based on FPGA platform because it has advantages of high performance, reconfigurability, and fast development round, etc. Although current FPGA accelerators have demonstrated better performance over generic processors, the accelerator design space has not been well exploited. One critical problem is that the computation throughput may not well match the memory bandwidth provided an FPGA platform. Consequently, existing approaches cannot achieve best performance due to under-utilization of either logic resource or memory bandwidth. At the same time, the increasing complexity and scalability of deep learning applications aggravate this problem. In order to overcome this problem, we propose an analytical design scheme using the roofline model. For any solution of a CNN design, we quantitatively analyze its computing throughput and required memory bandwidth using various optimization techniques, such as loop tiling and transformation. Then, with the help of rooine model, we can identify the solution with best performance and lowest FPGA resource requirement. As a case study, we implement a CNN accelerator on a VC707 FPGA board and compare it to previous approaches. Our implementation achieves a peak performance of 61.62 GFLOPS under 100MHz working frequency, which outperform previous approaches significantly.

References

D. Aysegul, J. Jonghoon, G. Vinayak, K. Bharadwaj, C. Alfredo, M. Berin, and C. Eugenio. Accelerating deep neural networks on mobile processor with embedded programmable logic. In NIPS 2013. IEEE, 2013.Google Scholar
S. Cadambi, A. Majumdar, M. Becchi, S. Chakradhar, and H. P. Graf. A programmable parallel accelerator for learning and classification. In Proceedings of the 19th international conference on Parallel architectures and compilation techniques, pages 273--284. ACM, 2010. Google ScholarDigital Library
S. Chakradhar, M. Sankaradas, V. Jakkula, and S. Cadambi. A dynamically configurable coprocessor for convolutional neural networks. In ACM SIGARCH Computer Architecture News, volume 38, pages 247--257. ACM, 2010. Google ScholarDigital Library
T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, and O. Temam. Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. SIGPLAN Not., 49(4):269--284, Feb. 2014. Google ScholarDigital Library
J. Cong and B. Xiao. Minimizing computation in convolutional neural networks. In Artificial Neural Networks and Machine Learning - ICANN 2014, pages 281--290. Springer, 2014.Google ScholarCross Ref
C. Farabet, C. Poulet, J. Y. Han, and Y. LeCun. Cnp: An fpga-based processor for convolutional networks. In Field Programmable Logic and Applications, 2009. FPL 2009. International Conference on, pages 32--37. IEEE, 2009.Google ScholarCross Ref
Google. Improving photo search: A step across the semantic gap. http://googleresearch.blogspot.com/2013/06/improving-photo-search-step-across.html.Google Scholar
S. Ji, W. Xu, M. Yang, and K. Yu. 3d convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell., 35(1):221--231, Jan. 2013. Google ScholarDigital Library
A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In F. Pereira, C. Burges, L. Bottou, and K. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 1097--1105. Curran Associates, Inc., 2012.Google ScholarDigital Library
H. Larochelle, D. Erhan, A. Courville, J. Bergstra, and Y. Bengio. An empirical evaluation of deep architectures on problems with many factors of variation. In Proceedings of the 24th International Conference on Machine Learning, ICML '07, pages 473--480, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278--2324, 1998.Google ScholarCross Ref
M. Peemen, A. A. Setio, B. Mesman, and H. Corporaal. Memory-centric accelerator design for convolutional neural networks. In Computer Design (ICCD), 2013 IEEE 31st International Conference on, pages 13--19. IEEE, 2013.Google ScholarCross Ref
L.-N. Pouchet, P. Zhang, P. Sadayappan, and J. Cong. Polyhedral-based data reuse optimization for configurable computing. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA '13, pages 29--38, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
M. Sankaradas, V. Jakkula, S. Cadambi, S. Chakradhar, I. Durdanovic, E. Cosatto, and H. P. Graf. A massively parallel coprocessor for convolutional neural networks. In Application-specific Systems, Architectures and Processors, 2009. ASAP 2009. 20th IEEE International Conference on, pages 53--60. IEEE, 2009. Google ScholarDigital Library
S. Williams, A. Waterman, and D. Patterson. Rooine: An insightful visual performance model for multicore architectures. Commun. ACM, 52(4):65--76, Apr. 2009. Google ScholarDigital Library
W. Zuo, Y. Liang, P. Li, K. Rupnow, D. Chen, and J. Cong. Improving high level synthesis optimization opportunity through polyhedral transformations. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA '13, pages 9--18, New York, NY, USA, 2013. ACM. Google ScholarDigital Library

Index Terms

Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks
1. Computer systems organization
  1. Embedded and cyber-physical systems
  2. Real-time systems

Recommendations

A GPU-Outperforming FPGA Accelerator Architecture for Binary Convolutional Neural Networks
Special Issue on Frontiers of Hardware and Algorithms for On-chip Learning, Special Issue on Silicon Photonics and Regular Papers

FPGA-based hardware accelerators for convolutional neural networks (CNNs) have received attention due to their higher energy efficiency than GPUs. However, it is challenging for FPGA-based solutions to achieve a higher throughput than GPU counterparts. ...
Read More
Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks
FPGA '16: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Convolutional Neural Networks (CNNs) have gained popularity in many computer vision applications such as image classification, face detection, and video analysis, because of their ability to train and classify with high accuracy. Due to multiple ...
Read More
A 7.663-TOPS 8.2-W Energy-efficient FPGA Accelerator for Binary Convolutional Neural Networks (Abstract Only)
FPGA '17: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

FPGA-based hardware accelerator for convolutional neural networks (CNNs) has obtained great attentions due to its higher energy efficiency than GPUs. However, it has been a challenge for FPGA-based solutions to achieve a higher throughput than GPU ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
FPGA '15: Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
February 2015
292 pages
ISBN:9781450333153
DOI:10.1145/2684746
General Chair:
George A. Constantinides
Imperial College
,
Program Chair:
Deming Chen
University of Illinois at Urbana-Champaign
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 February 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
acceleration
convolutional neural network
fpga
roofline model
Qualifiers
- research-article
Conference

Acceptance Rates
FPGA '15 Paper Acceptance Rate20of102submissions,20%Overall Acceptance Rate125of627submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1,310
  Total Citations
  View Citations
- 22,249
  Total Downloads
- Downloads (Last 12 months)2,362
- Downloads (Last 6 weeks)258
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks

FPGA '15: Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

ABSTRACT

References

Cited By

Index Terms

Recommendations

A GPU-Outperforming FPGA Accelerator Architecture for Binary Convolutional Neural Networks

Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks

A 7.663-TOPS 8.2-W Energy-efficient FPGA Accelerator for Binary Convolutional Neural Networks (Abstract Only)