skip to main content
10.1145/3061639.3062248acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article

Fault-Tolerant Training with On-Line Fault Detection for RRAM-Based Neural Computing Systems

Authors Info & Claims
Published:18 June 2017Publication History

ABSTRACT

An RRAM-based computing system (RCS) is an attractive hardware platform for implementing neural computing algorithms. Online training for RCS enables hardware-based learning for a given application and reduces the additional error caused by device parameter variations. However, a high occurrence rate of hard faults due to immature fabrication processes and limited write endurance restrict the applicability of on-line training for RCS. We propose a fault-tolerant on-line training method that alternates between a fault-detection phase and a fault-tolerant training phase. In the fault-detection phase, a quiescent-voltage comparison method is utilized. In the training phase, a threshold-training method and a re-mapping scheme is proposed. Our results show that, compared to neural computing without fault tolerance, the recognition accuracy for the Cifar-10 dataset improves from 37% to 83% when using low-endurance RRAM cells, and from 63% to 76% when using RRAM cells with high endurance but a high percentage of initial faults.

References

  1. M. M. Waldrop, "The chips are down for Moore's law," Nature News, vol. 530, no. 7589, p. 144, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  2. P. Chi et al., "Prime: A novel processing-in-memory architecture for neural network computation in ReRAM-based main memory," in ISCA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Degraeve et al., "Causes and consequences of the stochastic aspect of filamentary RRAM," Microelectronic Engineering, vol. 147, pp. 171--175, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. L. Xia et al., "Technological exploration of RRAM crossbar array for matrix-vector multiplication," Journal of Computer Science and Technology, vol. 31, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  5. C.-Y. Chen et al., "RRAM defect modeling and failure analysis based on march test and a novel squeeze-search scheme," IEEE TC, vol. 64.Google ScholarGoogle Scholar
  6. K. Beckmann et al., "Nanoscale hafnium oxide RRAM devices exhibit pulse dependent behavior and multi-level resistance capability," MRS Advances, pp. 1--6, 2016.Google ScholarGoogle Scholar
  7. M. Prezioso et al., "Training and operation of an integrated neuromorphic network based on metal-oxide memristors," Nature, vol. 521, no. 7550, pp. 61--64, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  8. S. Han et al., "Deep compression: Compressing deep neural network with pruning, trained quantization and Huffman coding," CoRR, abs/1510.00149, vol. 2, 2015.Google ScholarGoogle Scholar
  9. S. Kannan et al., "Modeling, detection, and diagnosis of faults in multilevel memristor memories," IEEE TCAD, vol. 34.Google ScholarGoogle Scholar
  10. L. Xia et al., "MNSIM: Simulation platform for memristor-based neuromorphic computing system," in DATE, pp. 469--474, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. T. Tang et al., "Binary convolutional neural network on rram," in ASP-DAC, pp. 782--787, IEEE, 2017.Google ScholarGoogle Scholar
  12. S. Kannan et al., "Sneak-path testing of memristor-based memories," in VLSID, pp. 386--391, IEEE, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. T. N. Kumar et al., "Operational fault detection and monitoring of a memristor-based LUT," in DATE, pp. 429--434, IEEE, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Torralba et al., "80 million tiny images: A large data set for nonparametric object and scene recognition," IEEE TPAMI, vol. 30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. C.-H. Cheng et al., "Novel ultra-low power RRAM with good endurance and retention," in VLSI Symp. Tech. Dig, pp. 85--86, 2010.Google ScholarGoogle Scholar
  16. Y.-S. Fan et al., "High endurance and multilevel operation in oxide semiconductor-based resistive RAM using thin-film transistor as a selector," ECS Solid State Letters, vol. 4, no. 9, pp. Q41--Q43, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  17. C. Xu et al., "Understanding the trade-offs in multi-level cell ReRAM memory design," in DAC, pp. 1--6, IEEE, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Y. LeCun et al., "The MNIST database of handwritten digits," 1998.Google ScholarGoogle Scholar
  19. C. Stapper, "Simulation of spatial fault distributions for integrated circuit yield estimations," IEEE TCAD, vol. 8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. L. Xia et al., "Switched by input: power efficient structure for RRAM-based convolutional neural network," in DAC, p. 125, ACM, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. Fault-Tolerant Training with On-Line Fault Detection for RRAM-Based Neural Computing Systems

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      DAC '17: Proceedings of the 54th Annual Design Automation Conference 2017
      June 2017
      533 pages
      ISBN:9781450349277
      DOI:10.1145/3061639

      Copyright © 2017 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 18 June 2017

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate1,770of5,499submissions,32%

      Upcoming Conference

      DAC '24
      61st ACM/IEEE Design Automation Conference
      June 23 - 27, 2024
      San Francisco , CA , USA

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader