skip to main content
10.1145/3061639.3062244acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article

Exploring Heterogeneous Algorithms for Accelerating Deep Convolutional Neural Networks on FPGAs

Authors Info & Claims
Published:18 June 2017Publication History

ABSTRACT

Convolutional neural network (CNN) finds applications in a variety of computer vision applications ranging from object recognition and detection to scene understanding owing to its exceptional accuracy. There exist different algorithms for CNNs computation. In this paper, we explore conventional convolution algorithm with a faster algorithm using Winograd's minimal filtering theory for efficient FPGA implementation. Distinct from the conventional convolution algorithm, Winograd algorithm uses less computing resources but puts more pressure on the memory bandwidth. We first propose a fusion architecture that can fuse multiple layers naturally in CNNs, reusing the intermediate data. Based on this fusion architecture, we explore heterogeneous algorithms to maximize the throughput of a CNN. We design an optimal algorithm to determine the fusion and algorithm strategy for each layer. We also develop an automated toolchain to ease the mapping from Caffe model to FPGA bitstream using Vivado HLS. Experiments using widely used VGG and AlexNet demonstrate that our design achieves up to 1.99X performance speedup compared to the prior fusion-based FPGA accelerator for CNNs.

References

  1. M. Alwani, H. Chen, M. Ferdman, and P. Milder. Fused-layer cnn accelerators. In MICRO, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  2. S. Cadambi and et al. A programmable parallel accelerator for learning and classification. In PACT, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Chakradhar and et al. A dynamically configurable coprocessor for convolutional neural networks. In ISCA, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. T. Chen and et al. Diannao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. In ASPLOS, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Y.-H. Chen, J. Emer, and V. Sze. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. ISCA, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Cong and et al. High-level synthesis for fpgas: From prototyping to deployment. TCAD, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Cong and B. Xiao. Minimizing computation in convolutional neural networks. In ICANN, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  8. Y. Guo, A. Yao, and Y. Chen. Dynamic network surgery for efficient dnns. In NIPS, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Han and et al. Eie: efficient inference engine on compressed deep neural network. In ISCA, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Han, H. Mao, and W. J. Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv, 2015.Google ScholarGoogle Scholar
  11. S. Ji, W. Xu, M. Yang, and K. Yu. 3D convolutional neural networks for human action recognition. TPAMI, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Y. Jia and et al. Caffe: Convolutional architecture for fast feature embedding. In MM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Lavin. Fast algorithms for convolutional neural networks. arXiv, 2015.Google ScholarGoogle Scholar
  15. Y. Liang and et al. High-level synthesis: productivity, performance, and software constraints. IJECE, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. B. Liu and et al. Sparse convolutional neural networks. In CVPR, 2015.Google ScholarGoogle Scholar
  17. L. Lu, Y. Liang, Q. Xiao, and S. Yan. Evaluating fast algorithms for convolutional neural networks on fpgas. In FCCM, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  18. J. Qiu and et al. Going deeper with embedded FPGA platform for convolutional neural network. In FPGA, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. H. Sharma and et al. Dnnweaver: From high-level deep network models to fpga acceleration. In CogArch, 2016.Google ScholarGoogle Scholar
  20. K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv, 2014.Google ScholarGoogle Scholar
  21. L. Song and et al. C-Brain: a deep learning accelerator that tames the diversity of CNNs through adaptive data-level parallelization. In DAC, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. C. Szegedy and et al. Going deeper with convolutions. In CVPR, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  23. Y. Wang, J. Xu, Y. Han, H. Li, and X. Li. DeepBurning: automatic generation of FPGA-based learning accelerators for the neural network family. In DAC, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. W. Wen and et al. Learning structured sparsity in deep neural networks. In NIPS, 2016.Google ScholarGoogle Scholar
  25. S. Williams, A. Waterman, and D. Patterson. Roofline: an insightful visual performance model for multicore architectures. CACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. Winograd. Arithmetic complexity of computations. Siam, 1980.Google ScholarGoogle Scholar
  27. C. Zhang and et al. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In FPGA, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    DAC '17: Proceedings of the 54th Annual Design Automation Conference 2017
    June 2017
    533 pages
    ISBN:9781450349277
    DOI:10.1145/3061639

    Copyright © 2017 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 18 June 2017

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    Overall Acceptance Rate1,770of5,499submissions,32%

    Upcoming Conference

    DAC '24
    61st ACM/IEEE Design Automation Conference
    June 23 - 27, 2024
    San Francisco , CA , USA

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader