ABSTRACT
Many computer applications have stringent requirements for continued correct operation of the computer in the presence of internal faults. The subject of design of such highly reliable computers has been extensively studied, and numerous techniques have been developed to achieve this high reliability. Such computers are termed "fault tolerant"; examples of applications are found in the aerospace industry, communication systems, and computer networks. Several designs of such systems have been proposed and some have been implemented. In general, these designs contain extensive hard-wired logic for such functions as fault masking, comparison, switching, and encoding-decoding.
- A Avizienis Design methods for fault-tolerant navigation computers Technical Report 32-1409 Jet Propulsion Laboratory Pasadena California October 1969Google Scholar
- A Avizienis et al The STAR (Self-Testing and Repairing) computer---An investigation of the theory and practice of fault-tolerant computer design IEEE Trans Vol C-20 No 11 pp 1312--1321 November 1971 Google ScholarDigital Library
- A Avizienis Arithmetic error codes: Cost and effectiveness studies for application in digital system design Proceedings of Symposium on Fault Tolerant Computing Pasadena California March 1971Google Scholar
- W G Bouricius et al Investigations in the design of an automatically repaired computer Digest of the First Annual IEEE Computer Conference Chicago Illinois September 1967Google Scholar
- W C Carter W G Bouricius A survey of fault tolerant computer architecture and its evaluation Computer Vol 4 No 1 pp 9--16 January 1971 Google ScholarDigital Library
- W C Carter et al Logic design for dynamic and interactive recovery Proceeding of Symposium on Fault Tolerant Computing Pasadena California March 1971Google Scholar
- W C Carter P R Schneider Design of dynamically checked computers Proceedings of IFIPS 1968 Congress Edinburgh Scotland August 1968Google Scholar
- R S Entner Presentation of advanced avionic digital computer baseline definition Naval Air Systems Command Washington D C September 1969Google Scholar
- J Goldberg K N Levitt R A Short Techniques for the realization of ultra-reliable spaceborne computers Final Report Phase 1 Contract NAS12-33 SRI Project 5580 Stanford Research Institute Menlo Park California September 1966Google Scholar
- J Goldberg et al Techniques for the realization of ultra-reliable spaceborne computers Final Report Contract NAS12-33 SRI Project 5580 Stanford Research Institute Menlo Park California June 1969Google Scholar
- A L Hopkins Jr A fault tolerant information processing concept for space vehicles IEEE Trans Vol C-20 No 11 pp 1394--1403 November 1971 Google ScholarDigital Library
- L J Koczela A three-failure-tolerant computer system IEEE Trans Vol C-20 No 11 pp 1389--1393 November 1971 Google ScholarDigital Library
- G Y Wang An in-house experimental aerospace multiprocessor---EXAM ERC Memo KC-T-031 NASA Electronics Research Center Cambridge Massachusetts September 1967Google Scholar
- G Y Wang System design of a multiprocessor organization Memorandum RC-T--179 NASA Electronics Research Center Cambridge Massachusetts 1969Google Scholar
- SIFT: software implemented fault tolerance
Recommendations
Sift-Out Modular Redundancy
A fault-tolerance technique for digital systems, Sift-Out Modular Redundancy, is proposed and designed. An appropriate number of identical channels are provided for each module. The number of channels depend upon the particular application, and all ...
The design, analysis, and verification of the SIFT fault tolerant system
ICSE '76: Proceedings of the 2nd international conference on Software engineeringThe SIFT (Software Implemented Fault Tolerance) computer is a fault-tolerant computer in which fault tolerance is achieved primarily by software mechanisms. Tasks are executed redundantly on multiple, independent processors that are loosely ...
An Experimental Evaluation of the REE SIFT Environment for Spaceborne Applications
DSN '02: Proceedings of the 2002 International Conference on Dependable Systems and NetworksThis paper presents an experimental evaluation of a software-implemented fault tolerance (SIFT) environment built around a set of self-checking processes called ARMORs running on different machines that provide error detection and recovery services to ...
Comments