Skip to main content

2020 | OriginalPaper | Buchkapitel

7. Testing, Checking, and Hardware Syndrome

verfasst von : Igor Schagaev, Eugene Zouev, Kaegi Thomas

Erschienen in: Software Design for Resilient Computer Systems

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In previous chapters, we introduced the processes of checking and testing, the first of the three main processes of Generalized Algorithm of Fault Tolerance—GAFT. In this chapter, we further discuss the process of checking hardware, at first software-based hardware-checking and at second hardware-based checking. For the software-based hardware checking, we show what a software-based test should include, when they are the preferred choice over hardware-based checking schemes, and especially how such tests can be scheduled in the system without interfering with ongoing real-time tasks. Further to support handling of hardware-based checking, we introduce a new system condition descriptor—so-called a syndrome, and illustrate how it can be used as a mechanism to signal to the operating system the hardware condition, including manifestation of detected error. We then show the steps the runtime system performs to eliminate the fault and in case of permanent errors how the software can reconfigure the hardware to exclude the faulty element. We also explain in which cases software has to adapt to the new hardware topology. We start by explaining how software-based checks can be used to detect hardware faults. Runtime systems use online or offline scheduling mechanisms for task management of programs—own—system software ones and user application ones. Since [14] it is expected that runtime system provides a special session of tasks scheduling (offline or online during execution) for the purposes of diagnostic of hardware conditions—recall Apple and Microsoft system starting delays. Later for some systems that operate in domain of real-time monitoring scheduling of tasks, critical in time of execution especially criticality of hardware availability and efficiency of process scheduling become crucial. In turn, testing itself becomes “hot” in terms of required time and coverage of hardware. Thus in this chapter, we initially analyze simple sequences of testing of hardware elements of computer systems. Further, we introduce a concept of transparent for user application procedure of hardware testing. This enables to prove integrity of computer system hardware, and guarantee it within a reasonable time, without delay of service of execution of user tasks.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Kirby W et al (1985) The NMFECC cray time-sharing system. Softw Pract Exper 15(1):87–103CrossRef Kirby W et al (1985) The NMFECC cray time-sharing system. Softw Pract Exper 15(1):87–103CrossRef
2.
Zurück zum Zitat Serlin O (1984) Fault-tolerant systems in commercial applications. Computer C 7(8):19–30 Serlin O (1984) Fault-tolerant systems in commercial applications. Computer C 7(8):19–30
3.
Zurück zum Zitat Blazewicz J et al (2007) Handbook on scheduling, from theory to applications. Springer, Berlin, HeidelbergMATH Blazewicz J et al (2007) Handbook on scheduling, from theory to applications. Springer, Berlin, HeidelbergMATH
4.
Zurück zum Zitat Ingo M (2002) Linux kernel archive. World Wide Web electronic publication, January 03, 2002 Ingo M (2002) Linux kernel archive. World Wide Web electronic publication, January 03, 2002
5.
Zurück zum Zitat Bogdanov J, Schagaev I (1990) Sliding slotting diagnosis in multiprocessors. In: IMECO congress proceedings, pp 141–150 Bogdanov J, Schagaev I (1990) Sliding slotting diagnosis in multiprocessors. In: IMECO congress proceedings, pp 141–150
6.
Zurück zum Zitat Garey M, Johnson D (1979) Computers and in-tractability: a guide to the theory of NP-completeness. W.H. Freeman and Company Garey M, Johnson D (1979) Computers and in-tractability: a guide to the theory of NP-completeness. W.H. Freeman and Company
7.
Zurück zum Zitat Knuth D (1998) The art of computer programming 3. Sorting and searching, vol III. Addison-Wesley Longman, AmsterdamMATH Knuth D (1998) The art of computer programming 3. Sorting and searching, vol III. Addison-Wesley Longman, AmsterdamMATH
8.
Zurück zum Zitat Johannes M (2002) The active object system-design and multiprocessor implementation. ETH Zurich, Zurich Johannes M (2002) The active object system-design and multiprocessor implementation. ETH Zurich, Zurich
9.
Zurück zum Zitat Liu CL, Layland J (1973) Scheduling algorithms for multiprogramming in a hard-real-time environment. J ACM 20(1):46–61MathSciNetCrossRef Liu CL, Layland J (1973) Scheduling algorithms for multiprogramming in a hard-real-time environment. J ACM 20(1):46–61MathSciNetCrossRef
10.
Zurück zum Zitat Castano V, Schagaev I (2014) Resilient computer system design. Springer. ISBN 978-3-319-15069-7 Castano V, Schagaev I (2014) Resilient computer system design. Springer. ISBN 978-3-319-15069-7
11.
Zurück zum Zitat Blaeser L, Monkman S, Schagaev I (2014) Evolving systems Worldcomp 2014. In: Proceedings of the international conference on foundations of computer science FCS’14, 2014. CSREA Press. ISBN 1-60132-270-4 Blaeser L, Monkman S, Schagaev I (2014) Evolving systems Worldcomp 2014. In: Proceedings of the international conference on foundations of computer science FCS’14, 2014. CSREA Press. ISBN 1-60132-270-4
Metadaten
Titel
Testing, Checking, and Hardware Syndrome
verfasst von
Igor Schagaev
Eugene Zouev
Kaegi Thomas
Copyright-Jahr
2020
DOI
https://doi.org/10.1007/978-3-030-21244-5_7

Neuer Inhalt