ABSTRACT
Deeper and larger Neural Networks (NNs) have made breakthroughs in many fields. While conventional CMOS-based computing platforms are hard to achieve higher energy efficiency. RRAM-based systems provide a promising solution to build efficient Training-In-Memory Engines (TIME). While the endurance of RRAM cells is limited, it's a severe issue as the weights of NN always need to be updated for thousands to millions of times during training. Gradient sparsification can address this problem by dropping off most of the smaller gradients but introduce unacceptable computation cost. We proposed an effective framework, SGS-ARS, including Structured Gradient Sparsification (SGS) and Aging-aware Row Swapping (ARS) scheme, to guarantee write balance across whole RRAM crossbars and prolong the lifetime of TIME. Our experiments demonstrate that 356× lifetime extension is achieved when TIME is programmed to train ResNet-50 on Imagenet dataset with our SGS-ARS framework.
- Aji et al. 2017. Sparse Communication for Distributed Gradient Descent. In Empirical Methods in Natural Language Processing (EMNLP).Google Scholar
- Karsten Beckmann et al. 2016. Nanoscale Hafnium Oxide RRAM Devices Exhibit Pulse Dependent Behavior and Multi-level Resistance Capability. Mrs Advances 1 (2016), 1--6.Google Scholar
- G. W. Burr et al. 2015. Large-scale neural networks implemented with nonvolatile memory as the synaptic weight element: Comparative performance analysis (accuracy, speed, and power). In IEEE International Electron Devices Meeting. 4.4.1--4.4.4.Google Scholar
- C. H Cheng et al. 2010. Novel Ultra-low power RRAM with good endurance and retention. In VLSI Technology. 85--86.Google Scholar
- Ming Cheng et al. 2017. TIME: A Training-in-memory Architecture for Memristor-based Deep Neural Networks. In Proceedings of the 54th Annual Design Automation Conference 2017. ACM, 26. Google ScholarDigital Library
- Kaiming He et al. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
- Kevin Hsieh et al. 2017. Gaia: Geo-Distributed Machine Learning Approaching LAN Speeds. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17). USENIX Association, Boston, MA, 629--647. Google ScholarDigital Library
- Hsu et al. 2013. Self-rectifying bipolar TaOx/TiO2 RRAM with superior endurance over 1012 cycles for 3D high-density storage-class memory. In VLSI Technology. T166--T167.Google Scholar
- Andrej Karpathy et al. 2015. Deep visual-semantic alignments for generating image descriptions. In Computer Vision and Pattern Recognition.Google Scholar
- Yujun Lin et al. 2018. Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training. International Conference on Learning Representations (2018).Google Scholar
- Wei Liu et al. 2016. SSD: Single Shot MultiBox Detector. In European Conference on Computer Vision. 21--37.Google Scholar
- Olga Russakovsky et al. 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV) 115, 3 (2015), 211--252. Google ScholarDigital Library
- Linghao Song et al. 2017. PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning. In IEEE International Symposium on High PERFORMANCE Computer Architecture. 541--552.Google Scholar
- Christian Szegedy et al. 2015. Going deeper with convolutions. In Computer Vision and Pattern Recognition. 1--9.Google Scholar
- M. M. Waldrop. 2016. The chips are down for Moore's law. Nature 530, 7589 (2016), 144.Google Scholar
- Yu Wang et al. 2016. Low power Convolutional Neural Networks on a chip. In IEEE International Symposium on Circuits and Systems. 129--132.Google Scholar
- Lixue Xia et al. 2017. Fault-Tolerant Training with On-Line Fault Detection for RRAM-Based Neural Computing Systems. In Design Automation Conference. 1--6. Google ScholarDigital Library
Recommendations
ERIS live: a NUMA-aware in-memory storage engine for tera-scale multiprocessor systems
SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of DataThe ever-growing demand for more computing power forces hardware vendors to put an increasing number of multiprocessors into a single server system, which usually exhibits a non-uniform memory access (NUMA). In-memory database systems running on NUMA ...
Long Live TIME: Improving Lifetime for Training-In-Memory Engines by Structured Gradient Sparsification
2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)Deeper and larger Neural Networks (NNs) have made breakthroughs in many fields. While conventional CMOS-based computing platforms are hard to achieve higher energy efficiency. RRAM-based systems provide a promising solution to build efficient Training-In-...
Multiple neural networks for a long term time series forecast
The artificial neural network (ANN) methodology has been used in various time series prediction applications. However, the accuracy of a neural network model may be seriously compromised when it is used recursively for making long-term multi-step ...
Comments