ABSTRACT
Due to modern technology trends, fault tolerance (FT) is acquiring an ever increasing research attention. To reduce the overhead introduced by the FT features, several techniques have been proposed. One of these techniques is Instruction-Level Fault Tolerance Configurability (ILCOFT). ILCOFT enables application developers to protect different instructions at varying degrees, devoting more resources to protect the most critical instructions, and saving resources by weakening protection of other instructions. It is, however, not trivial to assign a proper protection level for every instruction. This work introduces the notion of Instruction Vulnerability Factor (IVF), which evaluates how faults in every instruction affect the final application output. The IVF is computed off-line, and is then used by ILCOFT-enabled systems to assign the appropriate protection level to every instruction. IVF releases the programmer from the need to assign the necessary protection level to every instruction by hand. Experimental results demonstrate that IVF-based ILCOFT reduces the instruction duplication performance penalty by up to 77%, while the maximum output damage due to undetected faults does not exceed 0.6% of the total application output.
- P. Shivakumar, M. Kistler, S. Keckler, D. Burger, and L. Alvisi, "Modeling the Effect of Technology Trends on the Soft Error Rate of Combinational Logic," in DSN-02: Proc. 2002 Int. Conf. on Dependable Systems and Networks, Washington, DC, USA, 2002, pp. 389--398. Google ScholarDigital Library
- T. Rao and E. Fujiwara, Error-Control Coding for Computer Systems. Upper Saddle River, NJ, USA: Prentice-Hall, Inc., 1989. Google ScholarDigital Library
- D. Borodin, B. Juurlink, and S. Vassiliadis, "Instruction-Level Fault Tolerance Configurability," in IC-SAMOS VII: Proc. Int. Conf. on Embedded Computer Systems: Architectures, Modeling, and Simulation, July 2007, pp. 110--117.Google Scholar
- D. Borodin, B. Juurlink, S. Hamdioui, and S. Vassiliadis, "Instruction-Level Fault Tolerance Configurability," Journal of Signal Processing Systems, vol. 57, no. 1, pp. 89--105, October 2009. Google ScholarDigital Library
- A. Sundaram, A. Aakel, D. Lockhart, D. Thaker, and D. Franklin, "Efficient Fault Tolerance in Multi-Media Applications through Selective Instruction Replication," in WREFT-08: Proc. of the 2008 workshop on Radiation effects and fault tolerance in nanometer technologies. New York, NY, USA: ACM, 2008, pp. 339--346. Google ScholarDigital Library
- S. Mukherjee, C. Weaver, J. Emer, S. Reinhardt, and T. Austin, "A Systematic Methodology to Compute the Architectural Vulnerability Factors for a High-Performance Microprocessor," in MICRO-36: Proc. of the 36th Annual IEEE/ACM Int. Symp. on Microarchitecture. Washington, DC, USA: IEEE Computer Society, 2003, p. 29. Google ScholarDigital Library
- T. Austin, E. Larson, and D. Ernst, "SimpleScalar: An Infrastructure for Computer System Modeling," Computer, vol. 35, no. 2, pp. 59--67, 2002. Google ScholarDigital Library
- M. Franklin, "A Study of Time Redundant Fault Tolerance Techniques for Superscalar Processors," Proc. IEEE Int. Workshop on Defect and Fault Tolerance in VLSI Systems, pp. 207--215, Nov 1995. Google ScholarDigital Library
- J. von Neumann, "Probabilistic Logics and the Synthesis of Reliable Organisms from Unreliable Components," in Automata Studies, ser. Annals of Mathematics Studies. Princeton, NJ: Princeton University Press, 1956, vol. 34, pp. 43--98.Google Scholar
- B. Johnson, Design and Analysis of Fault-Tolerant Digital Systems. Addison-Wesley, Jan 1989. Google ScholarDigital Library
- Fibonacci numbers at Wikipedia, http://en.wikipedia.org/wiki/Fibonacci_number.Google Scholar
- C. Lee, M. Potkonjak, and W. H. Mangione-Smith, "MediaBench: A Tool for Evaluating and Synthesizing Multimedia and Communicatons Systems," in MICRO-30: Proc. of the 30th Annual ACM/IEEE Int. Symp. on Microarchitecture. Washington, DC, USA: IEEE Computer Society, 1997, pp. 330--335. Google ScholarDigital Library
- N. Oh, P. P. Shirvani, and E. J. McCluskey, "Error Detection by Duplicated Instructions in Super-Scalar Processors," IEEE Transactions on Reliability, vol. 51, no. 1, pp. 63--75, Mar 2002.Google ScholarCross Ref
Index Terms
- Protective redundancy overhead reduction using instruction vulnerability factor
Recommendations
Computational Arrays with Flexible Redundancy
Different multiple redundancy schemes for fault detection and correction in computational arrays are proposed and analyzed. The basic idea is to embed a logical array of nodes onto a processor/switch array such that d processors, 1/spl les/d/spl les/4, ...
A Time Redundancy Approach to TMR Failures Using Fault-State Likelihoods
Failure to establish a majority among the processing modules in a triple modular redundant (TMR) system, called a TMR failure, is detected by using two voters and a disagreement detector. Assuming that no more than one module becomes permanently faulty ...
On Redundancy and Fault Detection in Sequential Circuits
In this correspondence we show that the well-known concepts of redundancy and undetectability of a stuck-at fault, which are equivalent in combinational circuits, are not equivalent in sequential circuits. We also show that some faults in sequential ...
Comments