Skip to main content
Erschienen in: Journal of Electronic Testing 6/2018

24.10.2018

A Fine-Grained Software-Implemented DMA Fault Tolerance for SoC Against Soft Error

verfasst von: Xiaozhi Du, Dongyang Luo, Chaohui He, Shuhuan Liu

Erschienen in: Journal of Electronic Testing | Ausgabe 6/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In system-on-chips (SoCs), DMA, as a peripheral module, plays an important role in data transmission. However, the structure shrinking of SoC leads to its proneness to radiation-induced soft errors, especially for DMA. This paper presents a fine-grained software-implemented fault tolerance for SoC, named DCRH, to enhance the reliability of DMA against soft errors. DCRH achieves fine-grained selective fault tolerance, protecting DMA without interfering other modules of SoC. Furthermore, it is transparent to the user application because it performs on driver layer. In this paper, we present our fault source analysis for DMA based on Xilinx Zynq-7010 SoC and the detailed design of DCRH. The method is then applied to bare-metal MicroZed so that a DCRH-enhanced DMA driver is developed. Finally, SSIFFI is engaged in the simulated DMA fault injection experiments to validate DCRH. The experimental results prove that DCRH can achieve high fault coverage for DMA, above 97%, with stable performance.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Weitere Produktempfehlungen anzeigen
Literatur
2.
Zurück zum Zitat Beard RV (1971) Failure accommodation in linear systems through self-reorganization. Dissertation. In: Massachusetts institute of technology Beard RV (1971) Failure accommodation in linear systems through self-reorganization. Dissertation. In: Massachusetts institute of technology
3.
Zurück zum Zitat Borkar S (2005) Designing reliable systems from unreliable components: the challenges of transistor variability and degradation. IEEE Micro 25(6):10–16CrossRef Borkar S (2005) Designing reliable systems from unreliable components: the challenges of transistor variability and degradation. IEEE Micro 25(6):10–16CrossRef
4.
Zurück zum Zitat Didehban M, Shrivastava A (2016) nZDC: A Compiler technique for near Zero Silent data Corruption. In Proc. 53rd ACM/EDAC/IEEE Design Automation Conference 48:1–48:6 Didehban M, Shrivastava A (2016) nZDC: A Compiler technique for near Zero Silent data Corruption. In Proc. 53rd ACM/EDAC/IEEE Design Automation Conference 48:1–48:6
5.
Zurück zum Zitat Döbel B, Härtig H, Engel M (2012) Operating system support for redundant multithreading. In: Proc. 10th ACM international conference on embedded software, vol 83, p 92 Döbel B, Härtig H, Engel M (2012) Operating system support for redundant multithreading. In: Proc. 10th ACM international conference on embedded software, vol 83, p 92
6.
Zurück zum Zitat Du X, He C, Liu S, Zhang Y, Li Y, Xiong C, Tan P (2016) Soft error evaluation and vulnerability analysis in Xilinx Zynq-7010 system-on chip. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment 831:344–348CrossRef Du X, He C, Liu S, Zhang Y, Li Y, Xiong C, Tan P (2016) Soft error evaluation and vulnerability analysis in Xilinx Zynq-7010 system-on chip. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment 831:344–348CrossRef
7.
Zurück zum Zitat Du X, Liu S, Luo D, Zhang Y, Du X, He C, Ren X, Yang W, Yuan Y (2017) Single event effects sensitivity of low energy proton in Xilinx Zynq-7010 system-on chip. Microelectron Reliab 71:65–70CrossRef Du X, Liu S, Luo D, Zhang Y, Du X, He C, Ren X, Yang W, Yuan Y (2017) Single event effects sensitivity of low energy proton in Xilinx Zynq-7010 system-on chip. Microelectron Reliab 71:65–70CrossRef
8.
Zurück zum Zitat Du X, Luo D, Shi K, He C, Liu S (2018) FFI4SoC : a fine-grained fault injection framework for assessing reliability against soft error in SoC. Journal of Electronic Testing : Theory and Applications 34(1):15–25CrossRef Du X, Luo D, Shi K, He C, Liu S (2018) FFI4SoC : a fine-grained fault injection framework for assessing reliability against soft error in SoC. Journal of Electronic Testing : Theory and Applications 34(1):15–25CrossRef
9.
Zurück zum Zitat Faure F, Velazco R, Peronnard P (2006) Single-event-upset-like fault injection: a comprehensive framework. IEEE Trans Nucl Sci 52(6):2205–2209CrossRef Faure F, Velazco R, Peronnard P (2006) Single-event-upset-like fault injection: a comprehensive framework. IEEE Trans Nucl Sci 52(6):2205–2209CrossRef
10.
Zurück zum Zitat Huang KH, Abraham JA (1984) Algorithm-based fault tolerance for matrix operations. IEEE Transaction on Computers C-33:518–528CrossRef Huang KH, Abraham JA (1984) Algorithm-based fault tolerance for matrix operations. IEEE Transaction on Computers C-33:518–528CrossRef
11.
Zurück zum Zitat Kapritsos M, Wang Y, Quema V, Clement A, Alvisi L, Dahlin M (2012) All about eve: execute-verify replication for multi-core servers. In Proc USENIX Conference on Operating Systems Design and Implementation:237–250 Kapritsos M, Wang Y, Quema V, Clement A, Alvisi L, Dahlin M (2012) All about eve: execute-verify replication for multi-core servers. In Proc USENIX Conference on Operating Systems Design and Implementation:237–250
12.
Zurück zum Zitat Li D, Chen Z, Wu P, Vetter JS (2013) Rethinking algorithm-based fault tolerance with a cooperative software-hardware approach. In Proc. International Conference for High Performance Computing, Networking, Storage and Analysis Li D, Chen Z, Wu P, Vetter JS (2013) Rethinking algorithm-based fault tolerance with a cooperative software-hardware approach. In Proc. International Conference for High Performance Computing, Networking, Storage and Analysis
13.
Zurück zum Zitat Martin-Ortega A, Alvarez M, Esteve S, Rodriguez S, Lopez-Buedo S (2008) Radiation hardening of FPGA-based SoCs through self-reconfiguration and XTMR techniques. In proc. 4th Southern Conference on Programmable Logic:261–264 Martin-Ortega A, Alvarez M, Esteve S, Rodriguez S, Lopez-Buedo S (2008) Radiation hardening of FPGA-based SoCs through self-reconfiguration and XTMR techniques. In proc. 4th Southern Conference on Programmable Logic:261–264
14.
Zurück zum Zitat Nicolaidis M (1999) Time redundancy based soft-error tolerance to rescue nanometer technologies. In Proc IEEE VLSI Test Symposium:86–94 Nicolaidis M (1999) Time redundancy based soft-error tolerance to rescue nanometer technologies. In Proc IEEE VLSI Test Symposium:86–94
15.
Zurück zum Zitat Reinhardt SK, Mukherjee SS (2000) Transient fault detection via simultaneous multithreading. ACM SIGARCH Computer Architecture News 28:25–36CrossRef Reinhardt SK, Mukherjee SS (2000) Transient fault detection via simultaneous multithreading. ACM SIGARCH Computer Architecture News 28:25–36CrossRef
16.
Zurück zum Zitat Shye A, Blomstedt J, Moseley T, Reddi VJ, Connors DA (2009) PLR: a software approach to transient fault tolerance for multicore architectures. IEEE Transaction on Dependable and Secure Computing 6(2):135–148CrossRef Shye A, Blomstedt J, Moseley T, Reddi VJ, Connors DA (2009) PLR: a software approach to transient fault tolerance for multicore architectures. IEEE Transaction on Dependable and Secure Computing 6(2):135–148CrossRef
17.
Zurück zum Zitat da Silva MP, Obelheiro RR, Koslovski GP (2017) Adaptive Remus: adaptive checkpointing for Xen-based virtual machine replication. International Journal of Parallel, Emergent and Distributed Systems 32(4):348–367CrossRef da Silva MP, Obelheiro RR, Koslovski GP (2017) Adaptive Remus: adaptive checkpointing for Xen-based virtual machine replication. International Journal of Parallel, Emergent and Distributed Systems 32(4):348–367CrossRef
18.
Zurück zum Zitat Wadden J, Lyashevsky A, Gurumurthi S, Sridharan V, Skadron K (2014) Real-world design and evaluation of compiler-managed GPU redundant multithreading. In Proc IEEE International Symposium on Computer Architecture:73–84 Wadden J, Lyashevsky A, Gurumurthi S, Sridharan V, Skadron K (2014) Real-world design and evaluation of compiler-managed GPU redundant multithreading. In Proc IEEE International Symposium on Computer Architecture:73–84
Metadaten
Titel
A Fine-Grained Software-Implemented DMA Fault Tolerance for SoC Against Soft Error
verfasst von
Xiaozhi Du
Dongyang Luo
Chaohui He
Shuhuan Liu
Publikationsdatum
24.10.2018
Verlag
Springer US
Erschienen in
Journal of Electronic Testing / Ausgabe 6/2018
Print ISSN: 0923-8174
Elektronische ISSN: 1573-0727
DOI
https://doi.org/10.1007/s10836-018-5757-2

Weitere Artikel der Ausgabe 6/2018

Journal of Electronic Testing 6/2018 Zur Ausgabe

Neuer Inhalt