ABSTRACT
The training of neural network (NN) is usually time-consuming and resource intensive. Memristor has shown its potential in computation of NN. Especially for the metal-oxide resistive random access memory (RRAM), its crossbar structure and multi-bit characteristic can perform the matrix-vector product in high precision, which is the most common operation of NN. However, there exist two challenges on realizing the training of NN. Firstly, the current architecture can only support the inference phase of training and cannot perform the backpropagation (BP), the weights update of NN. Secondly, the training of NN requires enormous iterations and constantly updates the weights to reach the convergence, which leads to large energy consumption because of lots of write and read operations. In this work, we propose a novel architecture, TIME, and peripheral circuit designs to enable the training of NN in RRAM. TIME supports the BP and the weights update while maximizing the reuse of peripheral circuits for the inference operation on RRAM. Meanwhile, a variability-free tuning scheme and gradually-write circuits are designed to reduce the cost of tuning RRAM. We explore the performance of both SL (supervised learning) and DRL (deep reinforcement learning) in TIME, and a specific mapping method of DRL is also introduced to further improve the energy efficiency. Experimental results show that, in SL, TIME can achieve 5.3x higher energy efficiency on average compared with the most powerful application-specific integrated circuits (ASIC) in the literature. In DRL, TIME can perform averagely 126x higher than GPU in energy efficiency. If the cost of tuning RRAM can be further reduced, TIME have the potential of boosting the energy efficiency by 2 orders of magnitude compared with ASIC.
- Volodymyr Mnih et al. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.Google Scholar
- Kaiming He et al. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE International Conference on Computer Vision, pages 1026--1034, 2015. Google ScholarDigital Library
- Stephen W. Keckler et al. GPUs and the Future of Parallel Computing. IEEE Micro, 31(5):7--17, 2011. Google ScholarDigital Library
- Yunji Chen et al. Dadiannao: A machine-learning supercomputer. In IEEE/ACM ISM, pages 609--622, 2015. Google ScholarDigital Library
- Boxun Li et al. Training itself: Mixed-signal training acceleration for memristor-based neural network. In ASPDAC, pages 361--366, 2014.Google Scholar
- Ping Chi et al. PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory. In ISCA, pages 27--39, 2016. Google ScholarDigital Library
- Ragib Hasan et al. Enabling back propagation training of memristor crossbar neuromorphic processors. In IJCNN, pages 21--28. IEEE, 2014.Google Scholar
- Lixue Xia et al. Switched by input: power efficient structure for rram-based convolutional neural network. In DAC, 2016. Google ScholarDigital Library
- LeCun et al. The mnist database of handwritten digits, 1998.Google Scholar
- D Soudry et al. Memristor-based multilayer neural networks with online gradient descent training. IEEE TNNLS, 26(10):1, 2015.Google Scholar
- Boxun Li et al. RRAM-Based Analog Approximate Computing. TCAD, 34(12):1--1, 2015.Google Scholar
- Shimeng Yu et al. A neuromorphic visual system using RRAM synaptic devices with Sub-pJ energy and tolerance to variability: Experimental characterization and large-scale modeling. IEDM, 2012:10.4.1--10.4.4, 2012.Google Scholar
- Y. Lecun. A theoretical framework for back-propagation. In Artificial Neural Networks: concepts and theory, 1992.Google Scholar
- G. W. Burr et al. Large-scale neural networks implemented with non-volatile memory as the synaptic weight element: Comparative performance analysis (accuracy, speed, and power). In IEDM, 2015.Google ScholarCross Ref
- Shuchang Zhou et al. Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160, 2016.Google Scholar
- Cong Xu et al. Understanding the trade-offs in multi-level cell reram memory design. In DAC, pages 1--6, 2013. Google ScholarDigital Library
- Lixue Xia et al. MNSIM: Simulation platform for memristor-based neuromorphic computing system. In DATE, pages 469--474, 2016. Google ScholarDigital Library
- H-S Philip Wong et al. Recent progress of phase change memory (PCM) and resistive switching random access memory (RRAM). In IEEE IMW, pages 1--5. IEEE, 2011.Google Scholar
- Xiangyu Dong et al. Nvsim: A circuit-level performance, energy, and area model for emerging non-volatile memory. In Emerging Memory Technologies, pages 15--50. Springer, 2014.Google ScholarCross Ref
Recommendations
Run-Time Reconfiguration: A method for enhancing the functional density of SRAM-based FPGAs
One way to further exploit the reconfigurable resources of SRAM FPGAs and increase functional density is to reconfigure them during system operation. This proces is referred to as Run-Time Reconfiguration (RTR). RTR is an approach to system ...
Novel Recurrent Neural Network for Time-Varying Problems Solving [Research Frontier]
By following the inspirational work of McCulloch and Pitts [1], lots of neural networks have been proposed, developed and studied for scientific research and engineering applications [2][18]. For instance, one classical neural network is Hopfield neural ...
Time-Varying Two-Phase Optimization Neural Network
In this article, a time-varying two-phase optimization neural network is proposed for the constrained time-varying optimization problem, which takes advantage of both the two-phase neural network and the time-varying programming neural network. ...
Comments