research-article

Cambricon: an instruction set architecture for neural networks

Authors:
Shaoli Liu

ICT, CAS, Beijing, China and Cambricon Ltd.

ICT, CAS, Beijing, China and Cambricon Ltd.
View Profile

,
Zidong Du

ICT, CAS, Beijing, China and Cambricon Ltd.

ICT, CAS, Beijing, China and Cambricon Ltd.
View Profile

,
Jinhua Tao

ICT, CAS, Beijing, China and Cambricon Ltd.

ICT, CAS, Beijing, China and Cambricon Ltd.
View Profile

,
Dong Han

ICT, CAS, Beijing, China and Cambricon Ltd.

ICT, CAS, Beijing, China and Cambricon Ltd.
View Profile

,
Tao Luo

ICT, CAS, Beijing, China and Cambricon Ltd.

ICT, CAS, Beijing, China and Cambricon Ltd.
View Profile

,
Yuan Xie

UCSB

UCSB
View Profile

,
Yunji Chen

ICT, CAS, Beijing, China and CAS Center for Excellence in Brain Science and Intelligence Technology

ICT, CAS, Beijing, China and CAS Center for Excellence in Brain Science and Intelligence Technology
View Profile

,
Tianshi Chen

ICT, CAS, Beijing, China and CAS Center for Excellence in Brain Science and Intelligence Technology and Cambricon Ltd.

ICT, CAS, Beijing, China and CAS Center for Excellence in Brain Science and Intelligence Technology and Cambricon Ltd.
View Profile

Authors Info & Claims

ACM SIGARCH Computer Architecture News Volume 44 Issue 3June 2016pp 393–405https://doi.org/10.1145/3007787.3001179

Published:18 June 2016Publication History

ACM SIGARCH Computer Architecture News

Abstract

Neural Networks (NN) are a family of models for a broad range of emerging machine learning and pattern recondition applications. NN techniques are conventionally executed on general-purpose processors (such as CPU and GPGPU), which are usually not energy-efficient since they invest excessive hardware resources to flexibly support various workloads. Consequently, application-specific hardware accelerators for neural networks have been proposed recently to improve the energy-efficiency. However, such accelerators were designed for a small set of NN techniques sharing similar computational patterns, and they adopt complex and informative instructions (control signals) directly corresponding to high-level functional blocks of an NN (such as layers), or even an NN as a whole. Although straightforward and easy-to-implement for a limited set of similar NN techniques, the lack of agility in the instruction set prevents such accelerator designs from supporting a variety of different NN techniques with sufficient flexibility and efficiency.

In this paper, we propose a novel domain-specific Instruction Set Architecture (ISA) for NN accelerators, called Cambricon, which is a load-store architecture that integrates scalar, vector, matrix, logical, data transfer, and control instructions, based on a comprehensive analysis of existing NN techniques. Our evaluation over a total of ten representative yet distinct NN techniques have demonstrated that Cambricon exhibits strong descriptive capacity over a broad range of NN techniques, and provides higher code density than general-purpose ISAs such as ×86, MIPS, and GPGPU. Compared to the latest state-of-the-art NN accelerator design DaDianNao [5] (which can only accommodate 3 types of NN techniques), our Cambricon-based accelerator prototype implemented in TSMC 65nm technology incurs only negligible latency/power/area overheads, with a versatile coverage of 10 different NN benchmarks.

References

Srimat Chakradhar, Murugan Sankaradas, Venkata Jakkula, and Srihari Cadambi. A Dynamically Configurable Coprocessor for Convolutional Neural Networks. In Proceedings of the 37th Annual International Symposium on Computer Architecture, 2010. Google ScholarDigital Library
Yun-Fan Chang, P. Lin, Shao-Hua Cheng, Kai-Hsuan Chan, Yi-Chong Zeng, Chia-Wei Liao, Wen-Tsung Chang, Yu-Chiang Wang, and Yu Tsao. Robust anchorperson detection based on audio streams using a hybrid I-vector and DNN system. In Proceedings of the 2014 Annual Summit and Conference on Asia-Pacific Signal and Information Processing Association, 2014.Google ScholarCross Ref
Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. DianNao: A Small-footprint High-throughput Accelerator for Ubiquitous Machine-learning. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, 2014. Google ScholarDigital Library
Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. A High-Throughput Neural Network Accelerator. IEEE Micro, 2015.Google ScholarCross Ref
Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, and Olivier Temam. DaDianNao: A Machine-Learning Supercomputer. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014. Google ScholarDigital Library
Ping Chi, Shuangchen Li, Cong Xu, Tao Zhang, Jishen Zhao, Yongpan Liu, Yu Wang, and Yuan Xie. A Novel Processing-in-memory Architecture for Neural Network Computation in ReRAM-based Main Memory. In Proceedings of the 43rd International Symposium on Computer Architecture (ISCA), 2016. Google ScholarDigital Library
A. Coates, B. Huval, T. Wang, D. J. Wu, and A. Y. Ng. Deep learning with cots hpc systems. In Proceedings of the 30th International Conference on Machine Learning, 2013.Google Scholar
G.E. Dahl, T.N. Sainath, and G.E. Hinton. Improving deep neural networks for LVCSR using rectified linear units and dropout. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013.Google ScholarCross Ref
V. Eijkhout. Introduction to High Performance Scientific Computing. In www.lulu.com, 2011. Google ScholarDigital Library
H. Esmaeilzadeh, P. Saeedi, B.N. Araabi, C. Lucas, and Sied Mehdi Fakhraie. Neural network stream processing core (NnSP) for embedded systems. In Proceedings of the 2006 IEEE International Symposium on Circuits and Systems, 2006.Google ScholarCross Ref
Hadi Esmaeilzadeh, Adrian Sampson, Luis Ceze, and Doug Burger. Neural Acceleration for General-Purpose Approximate Programs. In Proceedings of the 2012 IEEE/ACM International Symposium on Microarchitecture, 2012. Google ScholarDigital Library
C. Farabet, B. Martini, B. Corda, P. Akselrod, E. Culurciello, and Y. LeCun. NeuFlow: A runtime reconfigurable dataflow processor for vision. In Proceedings of the 2011 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2011.Google ScholarCross Ref
C. Farabet, C. Poulet, J.Y. Han, and Y. LeCun. CNP: An FPGA-based processor for Convolutional Networks. In Proceedings of the 2009 International Conference on Field Programmable Logic and Applications, 2009.Google ScholarCross Ref
V. Gokhale, Jonghoon Jin, A. Dundar, B. Martini, and E. Culurciello. A 240 G-ops/s Mobile Coprocessor for Deep Neural Networks. In IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2014. Google ScholarDigital Library
A. Graves and J. Schmidhuber. Framewise phoneme classification with bidirectional LSTM networks. In Proceedings of the 2005 IEEE International Joint Conference on Neural Networks, 2005.Google ScholarCross Ref
Atif Hashmi, Andrew Nere, James Jamal Thomas, and Mikko Lipasti. A Case for Neuromorphic ISAs. In Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems, 2011. Google ScholarDigital Library
Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. Learning Deep Structured Semantic Models for Web Search Using Clickthrough Data. In Proceedings of the 22Nd ACM International Conference on Conference on Information & Knowledge Management, 2013. Google ScholarDigital Library
INTEL. AVX-512. https://software.intel.com/en-us/blogs/2013/avx-512-instructions.Google Scholar
INTEL. MKL. https://software.intel.com/en-us/intel-mkl.Google Scholar
Pineda Fernando J. Generalization of back-propagation to recurrent neural networks. Phys. Rev. Lett., 1987.Google Scholar
Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. An Introduction to Statistical Learning. 2013. Google ScholarDigital Library
K. Jarrett, K. Kavukcuoglu, M. Ranzato, and Y. LeCun. What is the best multi-stage architecture for object recognition? In Proceedings of the 12th IEEE International Conference on Computer Vision, 2009.Google ScholarCross Ref
Shaoqing Ren Jian Sun Kaiming He, Xiangyu Zhang. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. In arXiv:1502.01852, 2015.Google Scholar
V. Kantabutra. On hardware for computing exponential and trigonometric functions. Computers, IEEE Transactions on, 1996. Google ScholarDigital Library
Alex Krizhevsky, Sutskever Ilya, and Geoffrey E. Hinton. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems 25. 2012. Google ScholarDigital Library
Hugo Larochelle, Dumitru Erhan, Aaron Courville, James Bergstra, and Yoshua Bengio. An Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation. In Proceedings of the 24th International Conference on Machine Learning, 2007. Google ScholarDigital Library
Q.V. Le. Building high-level features using large scale unsupervised learning. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013.Google ScholarCross Ref
Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998.Google ScholarCross Ref
Daofu Liu, Tianshi Chen, Shaoli Liu, Jinhong Zhou, Shengyuan Zhou, Olivier Teman, Xiaobing Feng, Xuehai Zhou, and Yunji Chen. PuDianNao: A Polyvalent Machine Learning Accelerator. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, 2015. Google ScholarDigital Library
Maashri, A.A. and DeBole, M. and Cotter, M. and Chandramoorthy, N. and Yang Xiao and Narayanan, V. and Chakrabarti, C. Accelerating neuromorphic vision algorithms for recognition. In Proceedings of the 49th ACM/EDAC/IEEE Design Automation Conference, 2012. Google ScholarDigital Library
G Marsaglia and W W. Tsang. The ziggurat method for generating random variables. Journal of statistical software, 2000.Google Scholar
Paul A Merolla, John V Arthur, Rodrigo Alvarez-icaza, Andrew S Cassidy, Jun Sawada, Filipp Akopyan, Bryan L Jackson, Nabil Imam, Chen Guo, Yutaka Nakamura, Bernard Brezzo, Ivan Vo, Steven K Esser, Rathinakumar Appuswamy, Brian Taba, Arnon Amir, Myron D Flickner, William P Risk, Rajit Manohar, and Dharmendra S Modha. A million spiling-neuron interated circuit with a scalable communication network and interface. Science, 2014.Google Scholar
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. Human-level control through deep reinforcement learning. In Nature, 2015.Google Scholar
M.A. Motter. Control of the NASA Langley 16-foot transonic tunnel with the self-organizing map. In Proceedings of the 1999 American Control Conference, 1999.Google Scholar
NVIDIA. CUBLAS. https://developer.nvidia.com/cublas.Google Scholar
C.S. Oliveira and E. Del Hernandez. Forms of adapting patterns to Hopfield neural networks with larger number of nodes and higher storage capacity. In Proceedings of the 2004 IEEE International Joint Conference on Neural Networks, 2004.Google ScholarCross Ref
David A. Patterson and Carlo H. Sequin. RISC I: A Reduced Instruction Set VLSI Computer. In Proceedings of the 8th Annual Symposium on Computer Architecture, 1981. Google ScholarDigital Library
M. Peemen, A.A.A. Setio, B. Mesman, and H. Corporaal. Memory-centric accelerator design for Convolutional Neural Networks. In Proceedings of the 31st IEEE International Conference on Computer Design, 2013.Google ScholarCross Ref
R Salakhutdinov and G Hinton. An Efficient Learning Procedure for Deep Boltzmann Machines. Neural Computation, 2012. Google ScholarDigital Library
M. Sankaradas, V. Jakkula, S. Cadambi, S. Chakradhar, I. Durdanovic, E. Cosatto, and H.P. Graf. A Massively Parallel Coprocessor for Convolutional Neural Networks. In Proceedings of the 20th IEEE International Conference on Application-specific Systems, Architectures and Processors, 2009. Google ScholarDigital Library
R. Sarikaya, G.E. Hinton, and A. Deoras. Application of Deep Belief Networks for Natural Language Understanding. Audio, Speech, and Language Processing, IEEE/ACM Transactions on, 2014. Google ScholarDigital Library
P. Sermanet and Y. LeCun. Traffic sign recognition with multi-scale Convolutional Networks. In Proceedings of the 2011 International Joint Conference on Neural Networks, 2011.Google ScholarCross Ref
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going Deeper with Convolutions. In arXiv:1409.4842, 2014.Google Scholar
O. Temam. A defect-tolerant accelerator for emerging high-performance applications. In Proceedings of the 39th Annual International Symposium on Computer Architecture, 2012. Google ScholarDigital Library
V. Vanhoucke, A. Senior, and M. Z. Mao. Improving the speed of neural networks on CPUs. In In Deep Learning and Unsupervised Feature Learning Workshop, NIPS 2011, 2011.Google Scholar
Yu Wang, Tianqi Tang, Lixue Xia, Boxun Li, Peng Gu, Huazhong Yang, Hai Li, and Yuan Xie. Energy Efficient RRAM Spiking Neural Network for Real Time Classification. In Proceedings of the 25th Edition on Great Lakes Symposium on VLSI, 2015. Google ScholarDigital Library
Cong Xu, Dimin Niu, Naveen Muralimanohar, Rajeev Balasubramonian, Tao Zhang, Shimeng Yu, and Yuan Xie. Overcoming the Challenges of Cross-Point Resistive Memory Architectures. In Proceedings of the 21st International Symposium on High Performance Computer Architecture, 2015.Google Scholar
Tao Xu, Jieping Zhou, Jianhua Gong, Wenyi Sun, Liqun Fang, and Yanli Li. Improved SOM based data mining of seasonal flu in mainland China. In Proceedings of the 2012 Eighth International Conference on Natural Computation, 2012.Google ScholarCross Ref
Xian-Hua Zeng, Si-Wei Luo, and Jiao Wang. Auto-Associative Neural Network System for Recognition. In Proceedings of the 2007 International Conference on Machine Learning and Cybernetics, 2007.Google Scholar
Zhengyou Zhang, M. Lyons, M. Schuster, and S. Akamatsu. Comparison between geometry-based and Gabor-wavelets-based facial expression recognition using multi-layer perceptron. In Proceedings of the Third IEEE International Conference on Automatic Face and Gesture Recognition, 1998. Google ScholarDigital Library
Jishen Zhao, Guangyu Sun, Gabriel H. Loh, and Yuan Xie. Optimizing GPU energy efficiency with 3D die-stacking graphics memory and reconfigurable memory interface. ACM Transactions on Architecture and Code Optimization, 2013. Google ScholarDigital Library

Index Terms

Cambricon: an instruction set architecture for neural networks
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks

Index terms have been assigned to the content through auto-classification.

Recommendations

Cambricon: an instruction set architecture for neural networks
ISCA '16: Proceedings of the 43rd International Symposium on Computer Architecture

Neural Networks (NN) are a family of models for a broad range of emerging machine learning and pattern recondition applications. NN techniques are conventionally executed on general-purpose processors (such as CPU and GPGPU), which are usually not ...
Read More
Cambricon-x: an accelerator for sparse neural networks
MICRO-49: The 49th Annual IEEE/ACM International Symposium on Microarchitecture

Neural networks (NNs) have been demonstrated to be useful in a broad range of applications such as image recognition, automatic translation and advertisement recommendation. State-of-the-art NNs are known to be both computationally and memory intensive, ...
Read More
Cambricon-Q: a hybrid architecture for efficient training
ISCA '21: Proceedings of the 48th Annual International Symposium on Computer Architecture

Deep neural network (DNN) training is notoriously time-consuming, and quantization is promising to improve the training efficiency with reduced bandwidth/storage requirements and computation costs. However, state-of-the-art quantized algorithms with ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGARCH Computer Architecture News Volume 44, Issue 3
ISCA'16
June 2016
730 pages
ISSN:0163-5964
DOI:10.1145/3007787
Editor:
Doug DeGroot
acm dot org
Issue’s Table of Contents
ISCA '16: Proceedings of the 43rd International Symposium on Computer Architecture
June 2016
756 pages
ISBN:9781467389471
General Chairs:
Sang Lyul Min
Seoul National University
,
Gabriel Loh
AMD Research
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 June 2016
Check for updates
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 204
  Total Citations
  View Citations
- 3,104
  Total Downloads
- Downloads (Last 12 months)297
- Downloads (Last 6 weeks)51
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Cambricon: an instruction set architecture for neural networks

ACM SIGARCH Computer Architecture News

Abstract

References

Cited By

Index Terms

Recommendations

Cambricon: an instruction set architecture for neural networks

Cambricon-x: an accelerator for sparse neural networks

Cambricon-Q: a hybrid architecture for efficient training