research-article

Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks

Authors:
Yufei Ma

Arizona State University, Tempe, AZ, USA

Arizona State University, Tempe, AZ, USA
View Profile

,
Yu Cao

Arizona State University, Tempe, AZ, USA

Arizona State University, Tempe, AZ, USA
View Profile

,
Sarma Vrudhula

Arizona State University, Tempe, AZ, USA

Arizona State University, Tempe, AZ, USA
View Profile

,
Jae-sun Seo

Arizona State University, Tempe, AZ, USA

Arizona State University, Tempe, AZ, USA
View Profile

FPGA '17: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysFebruary 2017Pages 45–54https://doi.org/10.1145/3020078.3021736

Published:22 February 2017Publication History

FPGA '17: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Pages 45–54

ABSTRACT

As convolution layers contribute most operations in convolutional neural network (CNN) algorithms, an effective convolution acceleration scheme significantly affects the efficiency and performance of a hardware CNN accelerator. Convolution in CNNs involves three-dimensional multiply and accumulate (MAC) operations with four levels of loops, which results in a large design space. Prior works either employ limited loop optimization techniques, e.g. loop unrolling, tiling and interchange, or only tune some of the design variables after the accelerator architecture and dataflow are already fixed. Without fully studying the convolution loop optimization before the hardware design phase, the resulting accelerator can hardly exploit the data reuse and manage data movement efficiently. This work overcomes these barriers by quantitatively analyzing and optimizing the design objectives (e.g. required memory access) of the CNN accelerator based on multiple design variables. We systematically explore the trade-offs of hardware cost by searching the design variable configurations, and propose a specific dataflow of hardware CNN acceleration to minimize the memory access and data movement while maximizing the resource utilization to achieve high performance. The proposed CNN acceleration scheme and architecture are demonstrated on a standalone Altera Arria 10 GX 1150 FPGA by implementing end-to-end VGG-16 CNN model and achieved 645.25 GOPS of throughput and 47.97 ms of latency, which is a >3.2× enhancement compared to state-of-the-art FPGA implementations of VGG model.

References

O. Russakovsky, et al. ImageNet large-scale visual recognition challenge. In Int. J. Computer Vision, 2015. Google ScholarDigital Library
A. Krizhevsky, et al. ImageNet classification with deep convolutional neural networks. In Neural Information Processing Systems (NIPS), 1097--1105, 2012.Google Scholar
K. Simonyan, et al. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Int. Conf. Learning Representations (ICLR), 2015.Google Scholar
K. He, et al. Deep residual learning for image recognition. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2016. Google ScholarCross Ref
Y.-H. Chen, et al. Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks. In IEEE Int. Solid-State Circuits Conf. (ISSCC), 2016. Google ScholarCross Ref
Y.-H. Chen, et al. Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks?, In ACM/IEEE Int. Symp. Computer Architecture (ISCA), 2016. Google ScholarDigital Library
C. Zhang, et al. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In ACM Int. Symp. on Field-Programmable Gate Arrays (FPGA), 2015. Google ScholarDigital Library
C. Zhang, et al. Energy-Efficient CNN Implementation on a Deeply Pipelined FPGA Cluster. In ACM Int. Symp. on Low Power Electronics and Design (ISLPED), 326--331, 2016. Google ScholarDigital Library
N. Suda, et al. Throughput-optimized OpenCL-based FPGA accelerator for large-scale convolutional neural networks. In Int. Symp. on Field-Programmable Gate Arrays (FPGA), 16--25, 2016. Google ScholarDigital Library
J. Qiu, et al. Going deeper with embedded FPGA platform for convolutional neural network. In ACM Int. Symp. on Field-Programmable Gate Arrays (FPGA), 26--35, 2016. Google ScholarDigital Library
Y. Ma, et al. Scalable and Modularized RTL Compilation of Convolutional Neural Networks onto FPGA, In Int. Conf. Field-Programmable Logic and Applications (FPL), 2016.Google Scholar
H. Li, et al. A High Performance FPGA-based Accelerator for Large-Scale Convolutional Neural Networks, In Int. Conf. Field-Programmable Logic and Applications (FPL), 2016.Google Scholar
A. Rahman, et al. Efficient FPGA acceleration of Convolutional Neural Networks using logical-3D compute array. In Design, Auto. & Test in Europe Conf. (DATE), 2016.Google Scholar
M. Motamedi, et al. Design space exploration of FPGA-based Deep Convolutional Neural Networks. In Asia and South Pacific Design Auto. Conf. (ASP-DAC), 2016. Google ScholarDigital Library
S. Han, et al. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. In Int. Conf. Learning Representations (ICLR), 2016.Google Scholar
S. Han, et al. EIE: Efficient Inference Engine on Compressed Deep Neural Network. In ACM/IEEE Int. Symp. Computer Architecture (ISCA), 2016. Google ScholarDigital Library
D. Bacon, et al. Compiler transformations for high-performance computing. In ACM Computing Surveys (CSUR), Volume 26 Issue 4, Pages 345--420, 1994. Google ScholarDigital Library

Index Terms

Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Hardware
  1. Emerging technologies
    1. Biology-related information processing
      1. Neural systems
  2. Integrated circuits
    1. Reconfigurable logic and FPGAs
      1. Hardware accelerators

Recommendations

Optimizing CNN-based Segmentation with Deeply Customized Convolutional and Deconvolutional Architectures on FPGA
Special Issue on Deep learning on FPGAs

Convolutional Neural Networks-- (CNNs) based algorithms have been successful in solving image recognition problems, showing very large accuracy improvement. In recent years, deconvolution layers are widely used as key components in the state-of-the-art ...
Read More
A Low-Power Deconvolutional Accelerator for Convolutional Neural Network Based Segmentation on FPGA: Abstract Only
FPGA '18: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Convolutional Neural Networks (CNNs) based algorithms have been successful in solving image recognition problems, showing very large accuracy improvement. In recent years, deconvolution layers are widely used as key components in the state-of-the-art ...
Read More
Optimizing Accelerator on FPGA for Deep Convolutional Neural Networks
Algorithms and Architectures for Parallel Processing
Abstract
With the development of deep learning, the traditional neural networks architecture has been gradually met the bottleneck of the performance. Convolutional neural networks (CNNs) has been widely concerned because of its high precision advantage. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
FPGA '17: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
February 2017
312 pages
ISBN:9781450343541
DOI:10.1145/3020078
General Chair:
Jonathan Greene
Microsemi, USA
,
Program Chair:
Jason H. Anderson
University of Toronto, Canada
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 February 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
FPGA
convolutional neural networks
hardware acceleration
Qualifiers
- research-article
Conference

Acceptance Rates
FPGA '17 Paper Acceptance Rate25of101submissions,25%Overall Acceptance Rate125of627submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 245
  Total Citations
  View Citations
- 5,927
  Total Downloads
- Downloads (Last 12 months)583
- Downloads (Last 6 weeks)64
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks

FPGA '17: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

ABSTRACT

References

Cited By

Index Terms

Recommendations

Optimizing CNN-based Segmentation with Deeply Customized Convolutional and Deconvolutional Architectures on FPGA

A Low-Power Deconvolutional Accelerator for Convolutional Neural Network Based Segmentation on FPGA: Abstract Only

Optimizing Accelerator on FPGA for Deep Convolutional Neural Networks

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks

FPGA '17: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

ABSTRACT

References

Cited By

Index Terms

Recommendations

Optimizing CNN-based Segmentation with Deeply Customized Convolutional and Deconvolutional Architectures on FPGA

A Low-Power Deconvolutional Accelerator for Convolutional Neural Network Based Segmentation on FPGA: Abstract Only

Optimizing Accelerator on FPGA for Deep Convolutional Neural Networks

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media