ABSTRACT
Resistive Random Access Memory (RRAM) and RRAM-based computing systems (RCS) provide energy-efficient technology options for neuromorphic computing. However, the applicability of RCS is limited by reliability problems that arise from the immature fabrication process. In order to take advantage of RCS in practical applications, fault-tolerant design is a key challenge. We present a survey of fault-tolerant designs for RRAM-based neuromorphic computing systems. We first describe RRAM-based crossbars and training architectures in RCS. Following this, we classify fault models into different categories, and review post-fabrication testing methods. Subsequently, online testing methods are presented. Finally, we present various fault-tolerant techniques that were designed to tolerate different types of RRAM faults. The methods reviewed in this survey represent recent trends in fault-tolerant designs of RCS, and are expected motivate further research in this field.
- Ching-Yi Chen et al. RRAM defect modeling and failure analysis based on march test and a novel squeeze-search scheme. IEEE Transactions on Computers, 64(1):180--190, 2015.Google ScholarCross Ref
- Sung Hyun Jo et al. Nanoscale memristor device as synapse in neuromorphic systems. Nano letters, 10(4):1297--1301, 2010.Google ScholarCross Ref
- Giacomo Indiveri et al. Integration of nanoscale memristor synapses in neuromorphic computing architectures. Nanotechnology, 24(38):384010, 2013.Google ScholarCross Ref
- Miao Hu et al. Memristor crossbar-based neuromorphic computing system: A case study. IEEE Transactions on Neural Networks and Learning Systems, 25(10):1864--1878, 2014.Google ScholarCross Ref
- Lixue Xia et al. Fault-tolerant training with on-line fault detection for RRAM-based neural computing systems. In Proceedings of the Design Automation Conference (DAC), page 33. ACM, 2017. Google ScholarDigital Library
- Leon Chua. Memristor-the missing circuit element. IEEE Transactions on Circuit Theory, 18(5):507--519, 1971.Google ScholarCross Ref
- Dmitri B Strukov et al. The missing memristor found. Nature, 453(7191):80--83, 2008.Google ScholarCross Ref
- Mohammad Javad Sharif et al. General SPICE models for memristor and application to circuit simulation of memristor-based synapses and memory cells. Journal of Circuits, Systems, and Computers, 19:407--424, 2010.Google Scholar
- Kyungah Seo et al. Analog memory and spike-timing-dependent plasticity characteristics of a nanoscale titanium oxide bilayer resistive switching device. Nanotechnology, 22(25):254023, 2011.Google ScholarCross Ref
- Ting Chang et al. Short-term memory to long-term memory transition in a nanoscale memristor. ACS Nano, 5(9):7669--7676, 2011.Google ScholarCross Ref
- Z Fang et al. Multilayer-based forming-free RRAM devices with excellent uniformity. IEEE Electron Device Letters, 32:566--568, 2011.Google ScholarCross Ref
- Cory E Merkel et al. Reconfigurable n-level memristor memory design. In Neural Networks (IJCNN), International Joint Conference on, pages 3042--3048. IEEE, 2011.Google Scholar
- Song Han et al. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. arXiv preprint arXiv:1510.00149, 2015.Google Scholar
- Harris Drucker et al. Improving generalization performance using double back-propagation. IEEE Transactions on Neural Networks, 3:991--997, 1992. Google ScholarDigital Library
- Lixue Xia et al. Technological exploration of RRAM crossbar array for matrix-vector multiplication. Journal of Computer Science and Technology, 31:3--19, 2016.Google ScholarCross Ref
- Lerong Chen et al. Accelerator-friendly neural-network training: Learning variations and defects in RRAM crossbar. In Proceedings of the Conference on Design, Automation & Test in Europe, pages 19--24. European Design and Automation Association, 2017. Google ScholarDigital Library
- Lixue Xia et al. Switched by input: Power efficient structure for rram-based convolutional neural network. In Design Automation Conference (DAC), 2016 53nd ACM/EDAC/IEEE, pages 1--6. IEEE, 2016. Google ScholarDigital Library
- Ping Chi et al. PRIME: a novel processing-in-memory architecture for neural network computation in ReRAM-based main memory. In ACM SIGARCH Computer Architecture News, volume 44, pages 27--39, 2016. Google ScholarDigital Library
- Ming Cheng et al. TIME: A training-in-memory architecture for memristor-based deep neural networks. In Design Automation Conference (DAC), pages 1--6. IEEE, 2017. Google ScholarDigital Library
- Linghao Song et al. Pipelayer: A pipelined ReRAM-based accelerator for deep learning. HPCA, 2017.Google Scholar
- R Degraeve et al. Causes and consequences of the stochastic aspect off lamentary RRAM. Microelectronic Engineering, 147:171--175, 2015. Google ScholarDigital Library
- Wenqin Huangfu et al. Computation-oriented fault-tolerance schemes for RRAM computing systems. In Asia and South Pacific Design Automation Conference (ASP-DAC), pages 794--799. IEEE, 2017.Google Scholar
- Karsten Beckmann et al. Nanoscale hafnium oxide RRAM devices exhibit pulse dependent behavior and multi-level resistance capability. MRS Advances, 1:3355--3360, 2016.Google ScholarCross Ref
- Jilan Lin et al. Rescuing memristor-based computing with non-linear resistance levels. In Design, Automation & Test in Europe Conference & Exhibition (DATE), pages 407--412. IEEE, 2018.Google Scholar
- Sachhidh Kannan et al. Sneak-path testing of crossbar-based nonvolatile random access memories. IEEE Transactions on Nanotechnology, 12:413--426, 2013. Google ScholarDigital Library
- Sachhidh Kannan et al. Modeling, detection, and diagnosis of faults in multilevel memristor memories. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 34:822--834, 2015.Google ScholarCross Ref
- Mengyun Liu et al. Fault tolerance for rram-based matrix operations. In IEEE International Test Conference (ITC), pages 1--10, 2018.Google Scholar
- Mirko Prezioso et al. Training and operation of an integrated neuromorphic network based on metal-oxide memristors. Nature, 521:61, 2015.Google ScholarCross Ref
- Boxun Li et al. ICE: inline calibration for memristor crossbar-based computing engine. In Design, Automation and Test in Europe Conference and Exhibition (DATE), pages 1--4. IEEE, 2014. Google ScholarDigital Library
- Yang-Shun Fan et al. High endurance and multilevel operation in oxide semiconductor-based resistive RAM using thin-film transistor as a selector. ECS Solid State Letters, 4(9):Q41--Q43, 2015.Google ScholarCross Ref
- Alham Fikri Aji et al. Sparse communication for distributed gradient descent. arXiv preprint arXiv:1704.05021, 2017.Google Scholar
- Yi Cai et al. Long live time: improving lifetime for training-in-memory engines by structured gradient sparsification. In Proceedings of the Design Automation Conference (DAC), page 107. ACM, 2018. Google ScholarDigital Library
- Lixue Xia et al. Stuck-at fault tolerance in RRAM computing systems. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2017.Google Scholar
- Chenchen Liu et al. Rescuing memristor-based neuromorphic design with high defects. In Proceedings of the Design Automation Conference (DAC). ACM, 2017. Google ScholarDigital Library
- Lixue Xia et al. MNSIM: Simulation platform for memristor-based neuromorphic computing system. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2017.Google Scholar
- Fault tolerance in neuromorphic computing systems
Recommendations
Fault Tolerance in Multiprocessor Systems Without Dedicated Redundancy
An algorithm called RAFT (recursive algorithm for fault tolerance) for achieving fault tolerance in multiprocessor systems is described. Through the use of a combination of dynamic space- and time- redundancy techniques, RAFT achieves fault tolerance in ...
Comments