Skip to main content
main-content

Über dieses Buch

This book provides basics and selected advanced insights on how to generate reliability, safety and resilience within (socio) technical system developments. The focus is on working definitions, fundamental development processes, safety development processes and analytical methods on how to support such schemes. The method families of Hazard Analyses, Failure Modes and Effects Analysis and Fault Tree Analysis are explained in detail. Further main topics include semiformal graphical system modelling, requirements types, hazard log, reliability prediction standards, techniques and measures for reliable hardware and software with respect to systematic and statistical errors, and combination options of methods. The book is based on methods as applied during numerous applied research and development projects and the support and auditing of such projects, including highly safety-critical automated and autonomous systems. Numerous questions and answers challenge students and practitioners.

Inhaltsverzeichnis

Frontmatter

Chapter 1. Introduction and Objectives

Abstract
The introductory chapter gathers several narratives why systematic processes, techniques, and measures, in particular, classical and modified analytical system analysis methods, are key for the ideation, concept and design, development, verification and validation, implementation and testing of reliable, safe and resilient systems. It provides disciplinary structuring and ordering principles employed by the text, main dependencies of the chapters, and features of the text to increase readability and relevancy for practitioners and experts. Chapters are highlighted that can be understood as stand-alone introductions to key system analysis methods, processes, and method combination options. More recent applied research backgrounds are provided in the domains of automotive engineering of electrical vehicles, indoor localization systems, and sustainable energy supply of urban quarters. It shows that the book provides selected analysis methods and processes sufficient for achieving reliable, safe, and to lesser extent also resilient systems by resorting to classical system analysis methods including their tailoring and expected extensions.
Ivo Häring

Chapter 2. Technical Safety and Reliability Methods for Resilience Engineering

Abstract
Resilience of technical and socio-technical systems can be defined as their capability to behave in an acceptable way along the timeline pre-, during, and post-potentially dangerous or disruptive events, i.e., in each phase of the resilience cycle and overall. Hence, technical safety and reliability methods and processes for technical safety and reliability are strong candidate approaches to achieve the objective of engineering resilience for such systems. Also, when restricting the set of methods to classical safety and reliability assessment methods, e.g., classical hazard analysis (HA) methods, inductive failure mode and effects analysis (FMEA), deductive fault tree analysis (FTA), reliability block diagrams (RBDs), event tree analysis (ETA), and reliability prediction. Such methods have the advantage that they are already used in industrial practice. However, improving the resilience of systems is not their explicit aim. The present chapter covers how to allocate such methods to different resilience assessment, response, development and resilience management work phases, and tasks or conceptual entities when engineering resilience from a technical perspective. To this end, several assessment and analysis schemes, and risk control and resilience enhancement process schemes are employed, as well as the resilience or disruption response cycle. Each concept and the related process can be considered as a dimension to be considered in the generation of risk control and resilience. In particular, the resilience dimensions of risk management, resilience objectives, resilience cycle time phases, technical resilience capabilities, and system layers are used explicitly to explore their range of applicability. Also, typical system graphical modeling, hardware, and software development methods are assessed to document the usability of technical reliability and safety methods for resilience analytics and technically engineering resilience.
Ivo Häring

Chapter 3. Basic Technical Safety Terms and Definitions

Abstract
Well-defined key terms are crucial for developing safe, reliable, and resilient systems. In particular, when developments are interdisciplinary, if more departments of a company participate, or different companies and industry sectors. This also holds true if originally different safety standard traditions are used, e.g., machine safety, production safety, and functional safety. The chapter shows that a systematic ordering of key terms can be obtained by sorting with respect to input terms needed by ascending from basic to higher level concepts. For example, from system to hazard events and risks of system, to acceptable single and overall risk and resilience, to risk reduction, to functional safety, safety integrity level (SIL) on demand and continuously, to safety-related system, etc. Also, definitions of time lines and processes are given, e.g., product life cycle, development life cycle, and safety live cycle. A further ordering principle is that methods, i.e., techniques and measures including also organizational practices, support well-defined processes. The chapter also distinguishes between system understanding, system modeling, system simulation, system analysis, and evaluation of system analysis results.
Ivo Häring

Chapter 4. Introduction to System Modeling for System Analysis

Abstract
System analysis methods are used to systematically gain information about a system. They should start as early as possible because costs caused by undetected deficiencies typically increase exponentially during product development time. An accompanying analysis of a product also helps the analyst to better understand the product and to take up implicit developer knowledge. The chapter introduces general concepts and terms of system analysis: working system definition of modern systems, system boundaries, inductive and deductive analysis, graphical and tabular approaches, and qualitative and quantitative approaches. Different expected and unexpected failure types are explained from component and system perspective. Verbose definitions are given for reliability, safety, and different kinds of redundancy. Using a combinatorial approach, it explains why theoretical system analyses are more practicable than testing system samples.
Ivo Häring

Chapter 5. Introduction to System Analysis Methods

Abstract
This chapter gives an overview of classical system analysis methods. A representative example is given for each of the methods. It is not intended to be sufficient to actually use the method. However, it aids to support the selection of the correct type of method by listing the main analysis objectives of the methods. The categorization of methods in terms of graphical versus tabular, inductive versus deductive, and qualitative versus quantitative is refined by considering implementation examples, phases of developments, and life cycles where the methods are used. Methods covered are fault hazard analysis (FHA), failure modes, and effects analysis method.
Ivo Häring

Chapter 6. Fault Tree Analysis

Abstract
Fault tree or root cause analysis (FTA) is the most often used graphical deductive system analysis. It is by now the working horse in all fields with higher demands regarding reliability, availability, safety, and availability. It is increasingly also used in the domain of engineering resilience for assessing sufficient efficient response to expectable disruptions in terms of time needed to detect the disruption, to stabilize the system, and to repair or respond and recover. FTAs are conducted in several steps: identification of the goal of the analysis, of relevant top events to be considered, system analysis, FTA graphical construction using construction principles and rules, translation into Boolean algebra, determination of minimal cut sets (top-down and bottom-up), probabilistic calculation of top events (using the inclusion–exclusion principle), importance analysis, and executive summary. Each step is illustrated and several computation examples are provided. Major advantages of FTA include its targeted deductive nature that focuses on the system understanding and modeling used for the analysis by working in failure and success space as appropriate, the option to conduct the analysis qualitatively and quantitatively, and the evaluation options in terms of critical root cause combinations and minimal cut sets as determined by various importance measures. Further developments of FTA shortly discussed include its application to security issues in terms of attack trees, its application to the cybersecurity domain by typically assuming that most possible events actually occur, and time-dependent FTA (TDFTA). TDFTA goes beyond considering system phase- or use case-specific applications of FTA by considering time-dependent system states. Typically, such approaches are based on Markov or Petri models of systems. Another topic is to combine Monte Carlo approaches or fuzzy theory (fuzzification) to fault tree. Using distributions instead of failure probabilities allows to consider statistic uncertainties and deep uncertainties within FTA models of systems. In addition, several application examples are provided, in particular, how to use importance measures to identify component improvements that lead to the most efficient and economic improvement of systems.
Ivo Häring

Chapter 7. Failure Modes and Effects Analysis

Abstract
The inductive tabular system analysis method, failure modes and effects analysis (FMEA), and its variants are the most often used system analysis method type. The chapter presents the classical definitions of the FMEA method as well as application domains on system (functional) concept, system design, and the production process level. Such distinctions are by now often considered as application cases only. The steps to conduct successful FMEAs are provided and discussed: analysis aim formulation, preparation, structural analysis, functional analysis, failure analysis, measure analysis, optimization, and executive summary. It is emphasized that modern FMEAs take up many elements of similar tabular analysis such as hazard analysis and classical extensions such as asking for root causes, immediate consequences of failures, or criticality assessment of failures on system level (FMECA). Some templates even ask for potential additional failures leading to catastrophic events pointing toward approaches like double failure matrix. Modern extensions such as FMEDA are introduced, which assesses the diagnostic coverage (DC) and safe failure fraction (SFF) of failures as required for functional safety approaches. In all cases, an emphasis is on applicable evaluation options of obtained results sufficient for decision-making regarding system improvements: semi-quantitative and quantitative assessment of risks, including risk priority number (RPN); consideration or neglection of detection options, avoidance options, frequency and severity within risk maps; and efficiency of improvement measures. Advantages, disadvantages, and limitations of FEMEAs are discussed. Several FMEA templates are provided including sample column entries, compared and applied within worked-out examples.
Ivo Häring

Chapter 8. Hazard Analysis

Abstract
In contrary to fault tree analysis (FTA) focusing on root causes of failures on system level and to failure mode and effects analysis (FMEA) focusing on effects of single failures on system level, hazard analysis (HA) focuses on hazards (potential sources of high risks, hazard sources, hazard modalities, kinds of hazards, hazard modes) and their resulting risks on system level. It is argued that hazard analysis can be considered as inductive when starting from the hazard types. However, it allows for the (management) summary of bottom-up inductive as well as top-down deductive system analysis results. This is shown by relating hazard analysis to other approaches, in particular, the collection of results of other methods and the identification for more refined assessments. The following hazard analyses are introduced, related to each other and summarized using a matrix: preliminary hazard list (PHL), preliminary hazard analysis (PHA), subsystem hazard analysis (SSHA), system hazard analysis (SHA), and operation and support hazard analysis (O&SHA). The hazard log documents of past failures of similar systems, during development, testing and in the field, and their implications on system design and analyses. A further focus is on the evaluation of hazard analyses, including risk comparison, risk criteria, risk acceptance matrices, and risk graphs. All of them can also be used to determine requirements for safety functions, in particular, safety integrity levels (SIL). The different hazard analysis types are related to the phases of the functional safety life cycle according to IEC 61508 and a general system development process. Application examples include a standardization approach for a highly safety-critical domain summarizing best practices for hazard analyses, FMEA, FTA, reliability prediction, and hazard log, in particular, in which order to conduct and how to link the analysis methods and all the tabular approaches. Several hazard analysis templates and application examples are provided.
Ivo Häring

Chapter 9. Reliability Prediction

Abstract
Reliability prediction of a system estimates the probability that the system fails in a certain time interval given system context information. To this aim, several standards based on different data, failure modes, and models have been developed.The chapter provides a tabular comparison of standards as well as formal expressions that are necessary to understand their structure. It defines the term reliability as a time-dependent probability expression of a system. Related terms like availability, dependability are distinguished. The following terms are related to each other: failure density distribution, failure probability, reliability, and failure rate. Expressions are provided for the Weibull, exponential, and lognormal distributions. It is shown that most reliability prediction standards only use constant exponential failure rates as opposed to the more realistic bath-tube distribution. Multiplicative, additive, and mixed reliability prediction standards are distinguished, also standards that consider components only or standards that in addition consider system development conditions. Existing standards are compared regarding, inter alia, their coverage of types of environmental stress, component functional loading and mission profiles of systems under consideration as well as failure models used, and component and technologies covered. To this end, a comprehensive list of comparison items for standards and related software tools is provided. The chapter shows that reliability prediction complements the range of system analysis methods by providing frequency of event input for fault tree analysis (FTA), failure mode and effects analysis (FMEA), and Markov analysis. As field data is more easily accessible due to increasing sensing, intelligence, and connectivity, it is expected that scaling, updating, and even generation of empirical reliability prediction become feasible in ever-increasing application domains.
Ivo Häring

Chapter 10. Models for Hardware and Software Development Processes

Abstract
System development processes historically have been driven by scarcity and availability of resources and knowledge. Examples in the software domain include computation time, memory, programming costs, or the local accessibility of cloud computing and in the hardware domain costs of real-time testing, emulated or virtual testing, and verification. At the same time, an intellectual, however, often not lived, insight is that the increasing complexity of systems and developments requires the informed selection and implementation of structured development processes. A further challenging requirement is to leverage tacit or implicit rules, decision-making processes as well as technical processes to design development processes. For instance, loose requirements may lead to long-term inconsistencies and expensive efficiency loss at later stages of developments. Whereas formal requirements for system specifications prevent from fast prototyping for early customer feedback. Hence, it is important to balance process requirements with process flexibility expectations. Within this challenging context, the chapter introduces different types of software and hardware development processes. However, instead of following the doctrines of any of the plethora of existing system development processes and their communities, it first identifies generic properties of development processes by introducing characteristics of development processes including linear, iterative, incremental, big-bang integration, agile, technical, and organizational. Next, representative examples of models for software and hardware development processes are introduced including versions of the waterfall model, spiral model, V-model, and the Scrum model. In each case, the development process models are characterized using the introduced attributes and advantages and disadvantages of the models are discussed. The expectation is that this supports better selection and combination of development process models, in particular, their relation to methods to be conducted within process steps, and clarifies the relation of development models to such concepts as product life cycle and functional safety life cycle.
Ivo Häring

Chapter 11. The Standard IEC 61508 and Its Safety Life Cycle

Abstract
The international standard IEC 61508 on functional safety of electrical/ electronic/ programmable electronic (EEPE) safety-related systems describes a procedure to develop safe systems. It claims to be applicable to all systems that contain safety-related EEPE systems and where the failure of such EEPE systems causes significant risk for humans or the environment. As generic level A norm, it has to be adapted to the application domain. This can be conducted using existing application level B or C standards or, if they are not (yet) available, by informed application of the generic standard to a new domain. The efficient application of functional safety to advancing and new technology domains is key for successful products and short time to market. The chapter describes the standard IEC 61508 starting with a brief summary how the standard was developed and its updating history. The names of the different parts of the standard and a scheme to describe the general structure are provided. After recalling definitions and concepts from IEC 61508 that were already introduced in the textbook, it adds selected further terms, e.g., equipment under control (EUC), safety-related system, complexity of a component (type A and B components), and hardware failure tolerance (HFT) as well as a formal definition of safety function in terms of its qualitative and quantitative properties. This allows to transfer the functional safety approach to domain where reliable (active) functions needed to be realized using EEPE systems. It introduces the functional safety life cycle with its 16 phases by giving a summary of the objectives, inputs, and outputs of each phase. For each phase, sample methods are given to fulfill their requirements, in particular, such methods that are covered within the textbook, e.g., for the determination of safety integrity levels (SIL). To this end, also an overview of the methods recommended by IEC 61508 is given and how they are linked to the V-model development processes for hardware and software of EEPE systems. Finally, the safety life cycle is characterized and compared with respect to standard development processes.
Ivo Häring

Chapter 12. Requirements for Safety-Critical Systems

Abstract
The identification of existing safety functions of legacy systems, the determination of requirements for standard safety functions, and especially the development of innovative and resource-efficient new safety functions are key for the development of efficient and sustainable safety-relevant or safety-critical systems. For instance, it is not yet clear which functions of autonomous driving can be considered as reliability functions (intended functions) and which need to be considered as safety-critical functions, or both. In this case, it is obvious that tremendous economic, societal, and individual interests drive such safety-critical system developments and introductions. An example of an attempt to standardize parts of the verification and validation of automotive intended system functions is given by the Safety of the Intended Functionality standard ISO/PAS 21448, which is complementary to ISO 26262, itself an application standard of IEC 61508. Safety-related function development implementation even in standard situations needs approximately doubled resources when compared to standard developments. Therefore, it is important to identify sufficient, resource-efficient, economically, societally, and legally accepted safety functions. This includes to take advantage of any possible innovations to develop them. To this end, this chapter introduces properties (dimensions, aspects) of safety requirements. Several, mostly pairwise adjectives are listed with which safety requirements can be classified. Examples for safety function dimensions include active and passive; abstract and concrete; technical and non-technical; qualitative and quantitative; time-critical and not time-critical; static and dynamic; active and passive; pre-, during, and post-hazard event; cause and effect oriented; generating risk control or improving resilience; standardized and non-standardized; module and system specific; and intelligent and non-intelligent. Such safety functions might be very successful but not yet used, e.g., due to past technological gaps. Examples for safety requirements and classifications are provided. It is concluded which combinations of properties are likely to appear and which are not yet often used offering potentials for innovations.
Ivo Häring

Chapter 13. Semi-Formal Modeling of Multi-technological Systems I: UML

Abstract
Unified Modeling Language (UML) is a semi-formal graphical language which is used in the software development and other domains including railway system modeling, general requirements and system engineering, supply-chain management, and enterprise and business process modeling. UML offers profiles and extensions that enable the customization of UML to specific domains as well as its formalization by reducing and extending it to a formal graphical language, e.g., a unique representation of a formal state machine model. System modeling language (SysML) can be understood as UML applied to systems engineering of multi-technological systems. Semi-formal modeling (UML and SysML) is used in companies for system specification and requirements formulation and tracing; for system development, testing, verification, and validation; and for the communication between different development teams, company parts, and subcontractors. The chapter provides these backgrounds as well as limitations of semi-formal modeling. The focus of the chapter is the selection, self-consistent presentation, and exemplary use of UML and selected SysML diagrams and their elements sufficient for the specification of safety-critical requirements. UML is also used to trace the fulfillment of safety requirements. Representative safety requirements are modeled and finally recommendations are given regarding the selection of diagram types for the efficient modeling of different types of safety requirements as well as how to trace their fulfillment. Example diagrams used include UML class diagram, UML composite structure diagram, UML state diagram, UML sequence diagram, UML activity diagram, and SysML requirements diagram.
Ivo Häring

Chapter 14. Semi-formal Modeling of Multi-technological Systems II: SysML Beyond the Requirements Diagram

Abstract
The chapter introduces semi-formal graphical system modeling language (SysML) diagrams sufficient for modeling and tracing safety requirements. It builds on the previous chapter on the modeling and tracing of safety requirements using the unified modeling language (UML). The introduced minimum set of SysML diagrams and graphical diagram elements are sufficient to document system knowledge, to analyze system safety, and for requirements modeling and tracing. Being a multi-domain semi-formal language for systems and requirements specification, systems engineering and requirements tracing, it is argued that SysML is less abstract than UML while containing further diagrams that are well suited for efficient engineering graphical requirements modeling, in particular, the SysML requirements diagram (req). It is used together with the SysML block definition diagram (bdd), the SysML internal block diagram (ibd), the SysML activity diagram (act), the SysML use case diagram (uc), the SysML state machine diagram (stm), and the SysML sequence diagram (sd). For each diagram, typical applications and examples including figures are given how to use it for modeling safety requirements. The diagram types, bdd, ibd, stm, and act, are already sufficient for modeling most types of technical safety requirements when using a state machine approach that considers the structure and behavior of the socio-technical system. It is shown how this can be contextualized with the diagram types, uc and sd, to cover system application cases. The chapter provides examples mainly in the domain of embedded systems.  
Ivo Häring

Chapter 15. Combination of System Analysis Methods

Abstract
One may ask how single analytical system analysis methods can be leveraged for technical-driven systems’ resilience engineering using key concepts of risk control. More comprehensively one can ask for suitable method combinations. The chapter presents exemplary discussions on method combinations for system reliability, safety, and resilience analysis and improvement. Selection and ordering principles include level of detail and completeness of methods, development processes, and the functional safety life-cycle assessment and development process of IEC 61508. To this end, appropriate earlier findings of the textbook are extended. One specific efficient combination is considered in detail: systems modeling language (SysML), hazard analysis (HA), failure modes and effects analysis (FMEA), fault tree analysis (FTA), and reliability prediction. It uses the example of an electric vehicle and the identification of faults in after sales scenarios. First, the chapter discusses the advantages of semi-formal modeling with SysML in combination with FMEA and FTA. Then the connection of HA to other system analysis methods and the combination of FMEA and FTA. It treats the aggregation of subsystem FTAs to a system FTA. Finally, it shows how FTA results can be used after product development to optimize error detection and repair by providing efficient failure isolation procedures, also called fault isolation procedures (FIP). This is an example for the engineering of resilience in the sense of fast stabilization, response, and recovery post potential disruptive events during operation of modern green transport systems.
Ivo Häring

Chapter 16. Error Detecting and Correcting Codes

Abstract
Error-correcting coding can be defined as the art of adding redundancy to stored data or messages efficiently so that distortions can be detected and correctly revised. In the context of functional safety of IEC 61508, this can be addressed as method to control statistical hardware failures in case of memory bit flips or bit flips in wire or wireless (telemetry) messages. This includes other bit changes such as bit shifts. However, within classical functional safety approaches, the correction of data in case of detected errors is not recommended. Hence, any wrong corrections need to be controlled. To understand the mechanisms of detection and correction, first, elementary examples are considered in detail: parity bit, Hamming code Hamming(7,4), and cyclic-redundancy check (CRC) checksums. Based on this, an example is given how a safety-critical time is stored using various error-detection and error-correction codes within an embedded highly safety-critical system. It is investigated which error-detection and error-correction scheme is most efficient considering limited resources within a given application case in terms of critical and non-critical corrections. The following schemes are compared for one up to five bit flips: BCD, BCD with XOR, BCD with 2oo3, BCD with Hamming (7,4), Hamming(7,4,1), Hamming(21,16), and Hamming (21,16,1). Inter alia, correct and incorrect corrections are distinguished as well as safety-critical and non-safety-critical corrections. The results confirm that error correction must be very carefully assessed within operational contexts if suitable for safety-critical systems, in particular, in cases if there is no safe default state, e.g., as for scenarios of fast-moving autonomous transport systems. In such cases, it must be shown that the probability of wrong correction within the operational context leads to acceptable system risks.
Ivo Häring

Backmatter

Weitere Informationen