In previous chapters
, we
introduced
the processes of checking
and testing
, the first of the three main processes of Generalized Algorithm
of Fault Tolerance
—GAFT
. In this chapter, we further discuss the process of checking hardware, at first software-based hardware-checking
and at second hardware-based checking. For the software-based
hardware checking
, we show what a software-based test should include, when they are the preferred choice over hardware-based checking schemes, and especially how such tests can be scheduled in the system without interfering with ongoing real-time tasks. Further to support handling of hardware-based checking
, we introduce a new system condition descriptor—so-called a
syndrome, and illustrate how it can be used as a mechanism to signal to the operating system the hardware condition, including manifestation of detected error. We then show the steps the runtime system
performs to eliminate the fault and in case of permanent errors how the software can reconfigure the hardware to exclude the faulty element. We also explain in which cases software has to adapt to the new hardware topology. We start by explaining how software-based checks can be used to detect hardware faults
. Runtime systems use online or offline scheduling mechanisms for task management of programs—own—system software ones and user application ones. Since [
1‐
4] it is expected that runtime system provides a special session of tasks scheduling (offline or online during execution) for the purposes of diagnostic of hardware conditions—recall Apple and Microsoft system starting delays. Later for some systems that operate in domain of real-time monitoring scheduling of tasks, critical in time of execution especially criticality of hardware availability and efficiency of process scheduling become crucial. In turn, testing
itself becomes “hot” in terms of required time and coverage of hardware. Thus in this chapter, we initially analyze simple sequences of testing
of hardware
elements of computer systems. Further, we introduce a concept of
transparent for user application procedure
of hardware testing
. This enables to prove integrity of computer system hardware, and guarantee it within a reasonable time, without delay of service of execution of user tasks.