ABSTRACT
A compilation technique for reliability-aware software transformations is presented. An instruction-level reliability estimation technique quantifies the effects of hardware-level faults at the instruction-level while considering spatial and temporal vulnerabilities. It bridges the gap between hardware - where faults occur according to our fault model - and software (the abstraction level where we aim to increase reliability). For a given tolerable performance overhead, an optimization algorithm compiles an application software with respect to a tradeoff between performance and reliability. Compared to performance-optimized compilation, our method incurs 60%-80% lower application failures, averaged over various fault injection scenarios and fault rates.
- R. Baumann, "Radiation-induced soft errors in advanced semiconductor technologies," IEEE TDMR, vol. 5, no. 3, pp. 305--316, 2005.Google Scholar
- P. Giacinto et al., "An experimental Study of Soft Error in Microprocessors", MICRO, pp. 30--39, 2005. Google ScholarDigital Library
- R. Vadlamani et al., "Multicore soft error rate stabilization using adaptive dual modular redundancy", DATE, pp. 27--32, 2010. Google ScholarDigital Library
- D. Ernst et al., "Razor: circuit-level correction of timing errors for low-power operation," IEEE MICRO, vol. 24, no. 3, pp. 10--20, 2004. Google ScholarDigital Library
- S. S. Mukherjee, et al., "A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor", MICRO, pp. 29--40, 2003. Google ScholarDigital Library
- R. Venkatasubramanianw et al., "Low cost on-line fault detection using control flow assertions". IEEE IOLTS, pp. 137--143, 2003.Google ScholarCross Ref
- P. P. Shirvani et al., "Software implemented EDAC protection against SEUs". IEEE Transactions on Reliability, vol. 49, pp. 273--284, 2000.Google ScholarCross Ref
- V. Sridharan, "Introducing Abstraction to Vulnerability Analysis", Ph.D. Thesis, March 2010.Google Scholar
- V. Sridharan et al., "Eliminating Micro-architectural Dependency from Architectural Vulnerability", HPCA, pp. 117--128, 2009.Google Scholar
- G. A. Reis et al., "SWIFT: Software Implemented Fault Tolerance", IEEE CGO, pp. 243--254, 2005. Google ScholarDigital Library
- N. Oh et al., "Error detection by duplicated instructions in super-scalar processors", IEEE Transaction on Reliability, vol. 51, no. 1, pp. 63--75, 2002.Google ScholarCross Ref
- J. Hu et al., "In-Register Duplication: Exploiting Narrow-Width Value for Improving Register File Reliability," DSN, pp. 281--290, 2006. Google ScholarDigital Library
- J. S. Hu et al., "Compiler-Directed Instruction Duplication for Soft Error Detection," DATE, vol. 2, pp. 1056--1057, 2005 Google ScholarDigital Library
- G. A. Reis et al., "Software controlled fault tolerance," ACM TACO, vol. 2, pp. 366--396, 2005. Google ScholarDigital Library
- P. Lokuciejewski et al., "Combining Worst-Case Timing Models, Loop Unrolling, and Static Loop Analysis for WCET Minimization," ECRTS, pp. 35--44, 2009. Google ScholarDigital Library
- V. Sarkar, "Optimized Unrolling of Nested Loops", International Journal on Parallel Programing, 29(5):545--581, 2001. Google ScholarDigital Library
- J. Lee et al., "Compiler approach for reducing soft errors in register file", IEEE LCTES, pp. 41--49, 2009. Google ScholarDigital Library
- J. Yan et al., "Compiler guided register reliability improvement against soft errors," IEEE EMSOFT, pp. 203--209, 2005. Google ScholarDigital Library
- D. Borodin et al., "Protected Redundancy Overhead Reduction Using Instruction Vulnerability Factor," IEEE CF, pp. 319--326, 2010. Google ScholarDigital Library
- U. Schiffel et al., "Software-Implemented Hardware Error Detection: Costs and Gains," IEEE DEPEND, pp. 51--57, 2010. Google ScholarDigital Library
- C. Lee et al., "Compiler optimization on instruction scheduling for low power," IEEE ISSS, pp. 55--60, 2000. Google ScholarDigital Library
- K. Pattabiraman et al., "SymPLFIED: Symbolic program-level fault injection and error detection framework", DSN, pp. 472--481, 2008.Google ScholarCross Ref
- H. Ziade et al., "A Survey on Fault Injection Techniques", IAJIT, vol. 1, no. 2, pp. 171--186, 2004.Google Scholar
- R. Velazco et al., "Injecting Bit Flip Faults by Means of a purely Software Approach: a Case Studied", IEEE DFT, pp. 108--116, 2002. Google ScholarDigital Library
- M. Rebaudengo, M. S. Reorda, M. Violante, "Analysis of SEU effects in a pipelined processor", IEEE IOLTW, pp. 112--116, 2002. Google ScholarDigital Library
- Flux calculator: www.seutest.com/cgi-bin/FluxCalculator.cgi.Google Scholar
- J. Gaisler, "A portable and fault-tolerant microprocessor based on the SPARC v8 architecture", DSN, pp. 409--415, 2002. Google ScholarDigital Library
- IBM® XIV® Storage System cache: http://publib.boulder.ibm.com/infocenter/ibmxiv/r2/index.jsp.Google Scholar
- AMD Phenom™ II Processor Product Data Sheet 2010.Google Scholar
- X. Fu, W. Zhang, T. Li, J. Fortes, "Optimizing Issue Queue Reliability to Soft Errors on Simultaneous Multithreaded Architectures", International Conference on Parallel Processing, pp. 190--197, 2008. Google ScholarDigital Library
- H.264 Codec: http://iphome.hhi.de/suehring/tml/index.htmGoogle Scholar
- L. Lin et al., "Soft error and energy consumption interactions: a data cache perspective", ISLPED, pp. 132--137, 2004. Google ScholarDigital Library
Index Terms
- Reliable software for unreliable hardware: embedded code generation aiming at reliability
Recommendations
Instruction scheduling for reliability-aware compilation
DAC '12: Proceedings of the 49th Annual Design Automation ConferenceAn instruction scheduling technique is presented that targets at improving the reliability of a software program given a user-provided tolerable performance overhead. A look-ahead-based heuristic schedules instructions by evaluating the reliability of ...
Dependability Analysis of Fault Tolerant Systems Based on Partial Dynamic Reconfiguration Implemented into FPGA
DSD '12: Proceedings of the 2012 15th Euromicro Conference on Digital System DesignIn this paper, a dependability analysis of fault tolerant systems implemented into the SRAM-based FPGA is presented. The fault tolerant architectures are based on the redundancy of functional units associated with a concurrent error detection technique ...
Resilience and survivability in communication networks: Strategies, principles, and survey of disciplines
The Internet has become essential to all aspects of modern life, and thus the consequences of network disruption have become increasingly severe. It is widely recognised that the Internet is not sufficiently resilient, survivable, and dependable, and ...
Comments