skip to main content
10.1145/3290420.3290444acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiccipConference Proceedingsconference-collections
research-article

Efficient SIMD implementation for accelerating convolutional neural network

Published:02 November 2018Publication History

ABSTRACT

Convolutional Neural Network (CNN) has been used in a variety of fields such as computer vision, speech recognition, and natural language processing. Because the amount of computation has increased tremendously, CNN has lately been accelerated through accelerators such as Graphic Processing Unit (GPU). However, resource-constrained embedded platforms such as Internet of Things (IoT) devices cannot afford to have such accelerators. Therefore, it is important to accelerate CNN by only the CPU efficiently. In this paper, we propose a method to accelerate CNN by using the Single Instruction Multiple Data (SIMD) unit integrated in many CPUs. Modern CPU includes a SIMD unit which is commonly used for vector operations. The proposed method implemented on an ARM's NEON can maximize the utilization of vector registers in the SIMD unit. Our proposed implementation has achieved a speed-up of up to 2.66 in execution time and an energy reduction of up to 3.55 times than the conventional implementation.

References

  1. Simard, P. Y., Steinkraus, D., and Platt, J. C. 2003. Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis. In Proceedings of the Seventh International Conference on Document Analysis and Recognition, 2 (ICDAR '03). IEEE Computer Society, Washington, DC, USA, 958-. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Abdel-Hamid, O., Mohamed, A., Jiang, H., Deng, L., Penn, G., and Yu, D. 2014. Convolutional Neural Networks for Speech Recognition. IEEE/ACM Trans. Audio, Speech and Lang. Proc. 22, 10 (Oct. 2014), 1533--1545. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Collobert, R. and Weston, J. 2008. A unified architecture for natural language processing: deep neural networks with multitask learning. In Proceedings of the 25th international conference on Machine learning (ICML '08). ACM, New York, NY, USA, 160--167. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Liu, S. and Deng, W. 2015. Very deep convolutional neural network based image classification using small training sample size. In Proceedings of 3rd IAPR Asian Conference on Pattern Recognition (ACPR), 730--734.Google ScholarGoogle Scholar
  5. He, K., Zhang, X., Ren, S., and Sun, J. 2016. Deep Residual Learning for Image Recognition, In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770--778.Google ScholarGoogle Scholar
  6. Lomont, C. 2011. Introduction to Intel Advanced Vector Extensions. Intel White Paper.Google ScholarGoogle Scholar
  7. ARM. Architecture support for NEON and VFP. http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0204j/CJAJBFBF.html/Google ScholarGoogle Scholar
  8. Michael, J. F. 1966. Very high-speed computing systems. In Proceedings of the IEEE. 54, 1901--1909.Google ScholarGoogle ScholarCross RefCross Ref
  9. Siegel, H. J., Siegel, L. J., Kemmerer, F. C., PT Jr, M., HE Jr, S., and Smith, S. D. 1981. PASM: A partitionable SIMD/MIMD system for image processing and pattern recognition. IEEE Transactions on computers, 30, 12 (Dec. 1981), 934--947. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Lai, L., Suda, N., and Chandra, V. 2018. CMSIS-NN: Efficient Neural Network Kernels for Arm Cortex-M CPUs. arXiv preprint arXiv:1801.06601.Google ScholarGoogle Scholar
  11. LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. 1998. Gradient-based learning applied to document recognition. In Proceedings of the IEEE, 86, 11 (Nov 1998), 2278--2324.Google ScholarGoogle Scholar
  12. Krizhevsky, A., Sutskever, I., and Hinton, G. E. 2017. ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 6 (May 2017), 84--90. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Chen, L. C., Barron, J. T., Papandreou, G., Murphy, K., & Yuille, A. L. 2016. Semantic image segmentation with task-specific edge detection using cnns and a discriminatively trained domain transform. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 4545--4554.Google ScholarGoogle ScholarCross RefCross Ref
  14. Raspberry PI Foundation. RASPBERRY PI 3 MODEL B. https://www.raspberrypi.org/products/raspberry-pi-3-model-b_2016/Google ScholarGoogle Scholar
  15. OpenMP. OpenMP Specifications. http://www.openmp.org/specifications.Google ScholarGoogle Scholar
  16. LeCun, Y., Cortes, C., Burges, C. J. 2010. MNIST handwritten digit database. AT&T Labs. http://yann.lecun.com/exdb/mnist.Google ScholarGoogle Scholar

Index Terms

  1. Efficient SIMD implementation for accelerating convolutional neural network

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          ICCIP '18: Proceedings of the 4th International Conference on Communication and Information Processing
          November 2018
          326 pages
          ISBN:9781450365345
          DOI:10.1145/3290420
          • Conference Chairs:
          • Jalel Ben-Othman,
          • Hui Yu,
          • Program Chairs:
          • Herwig Unger,
          • Masayuki Arai

          Copyright © 2018 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 2 November 2018

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate61of301submissions,20%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader