ABSTRACT
An RRAM-based computing system (RCS) is an attractive hardware platform for implementing neural computing algorithms. Online training for RCS enables hardware-based learning for a given application and reduces the additional error caused by device parameter variations. However, a high occurrence rate of hard faults due to immature fabrication processes and limited write endurance restrict the applicability of on-line training for RCS. We propose a fault-tolerant on-line training method that alternates between a fault-detection phase and a fault-tolerant training phase. In the fault-detection phase, a quiescent-voltage comparison method is utilized. In the training phase, a threshold-training method and a re-mapping scheme is proposed. Our results show that, compared to neural computing without fault tolerance, the recognition accuracy for the Cifar-10 dataset improves from 37% to 83% when using low-endurance RRAM cells, and from 63% to 76% when using RRAM cells with high endurance but a high percentage of initial faults.
- M. M. Waldrop, "The chips are down for Moore's law," Nature News, vol. 530, no. 7589, p. 144, 2016.Google ScholarCross Ref
- P. Chi et al., "Prime: A novel processing-in-memory architecture for neural network computation in ReRAM-based main memory," in ISCA. Google ScholarDigital Library
- R. Degraeve et al., "Causes and consequences of the stochastic aspect of filamentary RRAM," Microelectronic Engineering, vol. 147, pp. 171--175, 2015. Google ScholarDigital Library
- L. Xia et al., "Technological exploration of RRAM crossbar array for matrix-vector multiplication," Journal of Computer Science and Technology, vol. 31, 2016.Google ScholarCross Ref
- C.-Y. Chen et al., "RRAM defect modeling and failure analysis based on march test and a novel squeeze-search scheme," IEEE TC, vol. 64.Google Scholar
- K. Beckmann et al., "Nanoscale hafnium oxide RRAM devices exhibit pulse dependent behavior and multi-level resistance capability," MRS Advances, pp. 1--6, 2016.Google Scholar
- M. Prezioso et al., "Training and operation of an integrated neuromorphic network based on metal-oxide memristors," Nature, vol. 521, no. 7550, pp. 61--64, 2015.Google ScholarCross Ref
- S. Han et al., "Deep compression: Compressing deep neural network with pruning, trained quantization and Huffman coding," CoRR, abs/1510.00149, vol. 2, 2015.Google Scholar
- S. Kannan et al., "Modeling, detection, and diagnosis of faults in multilevel memristor memories," IEEE TCAD, vol. 34.Google Scholar
- L. Xia et al., "MNSIM: Simulation platform for memristor-based neuromorphic computing system," in DATE, pp. 469--474, 2016. Google ScholarDigital Library
- T. Tang et al., "Binary convolutional neural network on rram," in ASP-DAC, pp. 782--787, IEEE, 2017.Google Scholar
- S. Kannan et al., "Sneak-path testing of memristor-based memories," in VLSID, pp. 386--391, IEEE, 2013. Google ScholarDigital Library
- T. N. Kumar et al., "Operational fault detection and monitoring of a memristor-based LUT," in DATE, pp. 429--434, IEEE, 2015. Google ScholarDigital Library
- A. Torralba et al., "80 million tiny images: A large data set for nonparametric object and scene recognition," IEEE TPAMI, vol. 30. Google ScholarDigital Library
- C.-H. Cheng et al., "Novel ultra-low power RRAM with good endurance and retention," in VLSI Symp. Tech. Dig, pp. 85--86, 2010.Google Scholar
- Y.-S. Fan et al., "High endurance and multilevel operation in oxide semiconductor-based resistive RAM using thin-film transistor as a selector," ECS Solid State Letters, vol. 4, no. 9, pp. Q41--Q43, 2015.Google ScholarCross Ref
- C. Xu et al., "Understanding the trade-offs in multi-level cell ReRAM memory design," in DAC, pp. 1--6, IEEE, 2013. Google ScholarDigital Library
- Y. LeCun et al., "The MNIST database of handwritten digits," 1998.Google Scholar
- C. Stapper, "Simulation of spatial fault distributions for integrated circuit yield estimations," IEEE TCAD, vol. 8. Google ScholarDigital Library
- L. Xia et al., "Switched by input: power efficient structure for RRAM-based convolutional neural network," in DAC, p. 125, ACM, 2016. Google ScholarDigital Library
- Fault-Tolerant Training with On-Line Fault Detection for RRAM-Based Neural Computing Systems
Recommendations
Fault-Tolerant Training Enabled by On-Line Fault Detection for RRAM-Based Neural Computing Systems
An resistive random-access memory (RRAM)-based computing system (RCS) is an attractive hardware platform for implementing neural computing algorithms. On-line training for RCS enables hardware-based learning for a given application and reduces the ...
Fault Injection and Dependability Evaluation of Fault-Tolerant Systems
The authors describe a dependability evaluation method based on fault injection that establishes the link between the experimental evaluation of the fault tolerance process and the fault occurrence process. The main characteristics of a fault injection ...
Graceful Degradation in Algorithm-Based Fault Tolerant Multiprocessor Systems
Algorithm-based fault tolerance (ABFT) is a technique which improves the reliability of a multiprocessor system by providing concurrent error detection and fault location capability to it. It encodes data at the system level and modifies the algorithm ...
Comments