Skip to main content
Top

1981 | Book

Reliability and Maintainability in Perspective

Technical, management and commercial aspects

Author: David J Smith, BSc, C.Eng, FIEE, FIQA

Publisher: Macmillan Education UK

insite
SEARCH

Table of Contents

Frontmatter

Understanding Terms, Parameters and Costs

Frontmatter
1. How Important are Reliability and Maintainability?
Abstract
Reference is often made in this type of literature to the spectacular reliability of many nineteenth-century engineering feats. Telford and Brunell indeed left a heritage of longstanding edifices such as the Menai and Clifton bridges. Fame is secured by their continued existence but little is remembered of the failures of their age. If, however, we concentrate on the success and seek to identify which characteristics of design or construction have given them a life span and freedom from failure far in excess of many twentieth-century products then two important considerations arise.
David J Smith
2. A Realistic Approach is Cost Conscious
Abstract
The practice of identifying quality costs is not new although it is only very large organisations that collect and analyse this highly significant proportion of their turnover. Attempts to set budget levels for the various elements of quality costs are even rarer as is the planning of activities for achieving them. This is unfortunate since the contribution of any activity to a business is measured ultimately in financial terms and the activities of quality, reliability and maintainability can claim no exception. If the costs of failure and repair were more fully reported and compared with the costs of achieving improvements then greater strides would be made in improving the position of this branch of engineering management. Greater recognition leads to the allocation of more resources. The pursuit of quality and reliability for their own sake is no justification for the investment of labour, plant and materials. Value Engineering, Work Study, Computer Planning, and other functions are quick to demonstrate that the savings generated by their activities more than offset the expenses involved.
David J Smith
3. Understanding Terms and Jargon
Abstract
Before any discussion involving these terms can take place it is essential that the word FAILURE is fully defined and understood. Unless the failed state is defined it is impossible to explain the meaning of Quality or of Reliability. There is only one definition of failure and that is:
NON-CONFORMANCE TO SOME DEFINED PERFORMANCE CRITERION
Refinements of definitions which differentiate between terms such as Defect, Malfunction, Failure, Fault and Reject are important in contract clauses and in classification and analysis of data but should not be allowed to cloud the understanding of the main parameters. The different definitions of these terms merely include and exclude failures by type, cause, degree or use. Given any specific definition of failure there is no ambiguity in the definitions of quality and reliability. Since failure is defined as departure from specification then to define different types of failure implies the existence of different performance specifications. Table 3.1 gives an indication of the classification of failures.
David J Smith

Achieving Reliability and Maintainability Objectives

Frontmatter
4. Design and Assurance for Reliability and Maintainability
Abstract
It is essential to believe that the design of an item establishes its potential reliability and maintainability. It is a fact of life that the transition from drawings to hardware always results in achieved levels lower than the original design objectives. It is therefore necessary to design to specified levels of reliability and maintainability higher than the field requirements and to follow up with assurance activities aimed at minimising the failures which arise during manufacture and use. Figure 1.1 illustrates the concept of a reliability level fixed by the ‘drawings’ and a lower field level due to the many failure possibilities arising from manufacture and subsequent use. The author makes no apology for repetition since this is fundamental to Reliability Engineering. This chapter will outline the activities and techniques which can be used both in design, to develop equipment which can meet the R and M objectives and also in manufacture and use, to minimise failures.
David J Smith
5. Design Factors Influencing Down Time
Abstract
The two main factors governing down time are equipment design and maintenance philosophy. In general it is the active repair elements which are determined by the design and the passive elements which are governed by the maintenance philosophy. The designer must be aware of the maintenance environment and of the possible equipment failure modes. He must understand that production difficulties all too often become field problems since, if assembly is difficult, maintenance will be well nigh impossible. Achieving acceptable repair times involves facilitating diagnosis and repair. The main design parameters are as follows.
David J Smith
6. Maintenance Philosophy and Down Time
Abstract
Both active and passive repair times are influenced by factors other than equipment design. Consideration of maintenance procedures, personnel, and spares provisioning is known as Maintenance Philosophy and plays an important part in determining overall availability. The costs involved in these activities are considerable and it is therefore important to strike a balance between over and underemphasising each factor. They can be grouped under six headings:
  • Organisation of maintenance resources.
  • Tools and Test Equipment.
  • Personnel — selection, training and motivation.
  • Maintenance instructions and manuals.
  • Spares provisioning.
  • Logistics.
David J Smith
7. Analysis of Failure Mode and Stress
Abstract
The probability of a device failing at any instant is very sensitive to the stress applied to it. Stresses, which can be classified as environmental or self generated, include:
The overall sum of these stresses is often pictured as constantly varying, having peaks and troughs, and superimposed on a distribution of strength levels for a group of devices. A failure is assumed to be the result of stress exceeding strength. The average strength of the group of devices will increase during the early failures period due to the elimination, from the population, of the weaker items. During wear out strength declines as a result of physical and chemical processes. An overall change of the average stress will cause more of the peaks to exceed the strength values and more failures will result. Figure 7.1 illustrates this concept showing a range of strength throughout the bathtub together with a superimposed strain.
David J Smith
8. Design and Qualification Testing
Abstract
There are three categories of testing:
  • Design Testing — Laboratory and prototype tests aimed at proving that a design will meet the specification. Initially bread board functional tests aimed at proving the design. This will extend to preproduction models which undergo environmental and reliability tests and may overlap with:
  • Qualification Testing — Total proving cycle using production models over the full range of the environmental and functional specification. This involves extensive marginal tests, climatic and shock tests, reliability and maintainability tests and the accumulation of some field data. It must not be confused with development or production testing. There is also:
  • Production Testing and Commissioning — Verification of conformance by testing modules and complete equipment. Some reliability proving and burn in may be involved. Generally failures will be attributable to component procurement, production methods, etc. Design related queries will arise but should diminish in quantity as production continues.
Acceptance testing implies a formal demonstration and may apply to qualification or to production tests depending upon the circumstances. In the former case a contract development may lead to a formal demonstration of design conformance and in the latter, equipment already in manufacture may undergo demonstration tests for reasons of quality audit or customer inspection.
David J Smith
9. Quality Assurance and Automatic Test Equipment
Abstract
Quality is simply defined as conformance to specification. It is not, in the engineering context, a measure of excellence. The simple and inexpensive item which conforms to specification is therefore of ‘higher’ quality than an elaborate and expensive item which does not. The purpose of Quality Assurance is to set up and operate a set of controls whereby the appropriate activities in design, procurement and manufacturing, test, installation and maintenance are carried out to ensure that products meet the specification at MINIMUM cost (see chapter 2). These activities fall into four areas.
David J Smith
10. Maintenance Handbooks
Abstract
The main objective of a maintenance manual is to provide all the information required to carry out each maintenance task without reference to the base workshop, design authority or any other source of information. It may, therefore, include the following:
  • Specification of system performance and functions.
  • Theory of operation and usage limitations.
  • Method of operation.
  • Range of operating conditions.
  • Supply requirements.
  • Corrective and preventive maintenance routines.
  • Permitted modifications.
  • Description of spares and alternatives.
  • List of test equipment and its check procedure.
  • Disposal instructions for hazardous materials.
The manual may range from a simple card, which could hang on a wall, to a small library of information comprising many handbooks for different applications and users. Field reliability and maintainability depend, to a large degree, on the maintenance instructions. The design team, or the maintainability engineer, has to supply information to the handbook writer and to collaborate with him if the instructions are to be effective.
David J Smith
11. Making Use of Field Feedback
Abstract
Failure data can be collected from prototype and production models or from the field. In either case a formal failure reporting document is necessary in order to ensure that the feedback is both consistent and adequate. Field information is far more valuable since it concerns failures and repair actions which have taken place under real conditions. Since recording field incidents relies on people it is subject to errors, omissions and misinterpretation. It is therefore important to collect all field data using a formal document. Information of this type has a number of uses the main two being feedback, resulting in modifications to prevent further defects, and the acquisition of statistical reliability and repair data. In detail then:
  • They indicate design and manufacture deficiencies and can be used to support reliability growth programmes.
  • They provide quality and reliability trends.
  • They provide subcontractor ratings.
  • They contribute statistical data for future reliability and repair time predictions.
  • They assist second line maintenance (workshop).
  • They enable spares provisioning to be refined.
  • They enable routine maintenance intervals to be revised.
  • They enable the field element of quality costs to be identified.
David J Smith

Making Measurements and Predictions

Frontmatter
12. Interpreting Data and Demonstrating Reliability
Abstract
This chapter deals with the interpretation of failure rates and MTBFs for the special case where random failures are assumed. We are dealing thus with constant failure rates and the equality λ = 1/θ applies. The next chapter will explore the analysis of variable failure rates.
David J Smith
13. Interpreting Variable Failure Rate Data
Abstract
The bathtub curve in figure 3.2 showed that, in addition to random failures, there are distributions of increasing and decreasing failure rate. In these variable failure rate cases it is of little value to consider the actual failure rate since only Reliability and MTBF are meaningful. In chapter 3 we saw that:
$$R\left( t \right) = \exp \left[ { - \int_0^t {\lambda \left( t \right)dt} } \right]$$
Since the relationship between failure rate and time takes many forms, and depends on the device in question, the integral cannot be evaluated for the general case. Even if the variation of failure rate with time were known it might well be of such a complicated nature that the integration would prove far from simple. In practice it is found that the relationship can usually be described by the following three-parameter distribution known as the Weibull Distribution.
$$R\left( t \right) = \exp \left[ { - {{\left( {\frac{{t - \gamma }} {\eta }} \right)}^\beta }} \right]$$
In the constant failure rate case it was seen that statements of MTBF and reliability could be made from the failure rate parameter which completely defined the distribution. In the Weibull case the reliability function requires three parameters (γ, β, η). They do not have a physical meaning as does failure rate and must be treated as merely numbers which allow us to compute reliability and MTBF. In the special case of γ = 0 and β = 1 the expression reduces to the simple exponential case with η = MTBF. This is slightly misleading because in the general case η is not equal to MTBF.
David J Smith
14. Demonstrating Maintainability
Abstract
Where demonstration of a maintainability requirement is contractual it is essential that the test method, and the conditions under which it is to be carried out, are fully described. If this is not observed then disagreements are likely to arise during the demonstration. Both supplier and customer wish to achieve the specified Mean Time To Repair at minimum cost and yet a precise demonstration having acceptable risks to all parties is extremely expensive. A true assessment of maintainability can only be made at the end of the equipment life and anything less will represent a sample carrying the risks described in chapters 12 and 13.
David J Smith
15. Reliability Prediction
Abstract
Whilst it is component failure rate that is measured the reliability of complete equipment and systems is the ultimate concern of the designer and customer. Reliability prediction is the process of calculating the anticipated system reliability from assumed component failure rates. It provides a quantitative measure of how close a design comes to meeting the reliability objective and also permits comparisons between alternative design proposals. The simplest type of prediction involves little more than a parts count. Individual stress levels are not considered and an average failure rate for each component type is multiplied by the number involved. The overall total failure rate is used to calculate the system MTBF or reliability. It will be seen in section 15.3 that this simple addition of failure rates takes no account of redundancy and therefore gives a worst case prediction. It was mentioned in section 7.5 that failure rate data usually refers to random failures (flat portion of the bathtub). As a result ‘parts count’ reliability predictions involve constant failure rates and the summing of failure rates is permissible. This is not always the case and the exceptions to this procedure will be clearly explained in this chapter. As the design details become firmer more sophisticated predictions can be attempted taking account of failure modes, redundancy of parts and modules, stresses and environment and the quality and screening of components. Some examples of typical failure rate data are given in Appendix 3 and are expressed in terms of 10−9 per hour.
David J Smith
16. Prediction of Repair Times
Abstract
Maintainability prediction is by no means as well developed as reliability prediction. The only fully developed systems are described in US MIL HDBK 472 which is dated 1966. In 1973 this document was withdrawn and replaced by US MIL STD 471A which contains no prediction techniques. The methods described in US MIL HDBK 472, although applicable to a range of equipment developed at that time, have much to recommend them and are still worthy of attention. Unfortunately the quantity of data required to develop these methods of prediction is so great that with increasing costs and shorter design lives the author fears that such exercises may not be repeated. On the other hand calculations requiring the statistical analysis of large quantities of data lend themselves to computer methods and the rapid increase of these facilities makes such a calculation feasible if the necessary repair time data for a very large sample of repairs (say 10000) were available.
David J Smith

Essential Management Topics

Frontmatter
17. Project Management
Abstract
Realistic reliability and maintainability objectives need to be set with due regard to the customer’s design and operating requirements and cost constraints. Some discussion and joint study with the customer may be required to establish economic reliability values which sensibly meet his requirements and are achievable within the proposed technology at the costs allowed for. Over-specifying the requirement may delay the project when tests eventually show that objectives cannot be met and it is realised that budgets will be exceeded. When specifying an MTBF it is a common mistake to include a confidence level, in fact the MTBF requirement stands alone. The addition of a confidence level implies a demonstration and supposes that the MTBF would be established by a single demonstration at the stated confidence. On the contrary, a design objective is a target and must be stated without statistical limitations.
David J Smith
18. Contract Clauses and their Pitfalls
Abstract
Since the late 1950s in the United States reliability and maintainability requirements have appeared in both military and civil engineering contracts. These contracts often carry penalties for failure to meet these objectives. For some years in the UK suppliers of military and commercial electronic and telecommunication equipment have also found that clauses specifying reliability and maintainability are being included in invitations to tender and in the subsequent contracts. Suppliers of highly reliable and maintainable equipment are often well able to satisfy such conditions with little or no additional design or manufacturing effort, but incur difficulty and expense since a formal demonstration of these parameters may not have been attempted before. Furthermore a failure reporting procedure may not exist and therefore historical data as to a product’s reliability or repair time may be unobtainable. The inclusion of system effectiveness parameters in a contract involves both the suppliers of good and poor equipment in additional activities. System Effectiveness clauses in contracts range from a few words — specifying failure rate or MTBF of all or part of the system — to some ten or twenty pages containing details of design and test procedures, methods of collecting failure data, methods of demonstrating reliability and repair time, limitations on component sources, limits to size and cost of test equipment, and so on. Two types of pitfall arise from such contractual conditions.
David J Smith
19. Product Liability
Abstract
Product liability is the liability of a supplier, designer or manufacturer to the customer for injury or loss resulting from a defect in that product. There are two main reasons why it has recently become the focus of attention. The first is the recent publication of a draft directive by the European Economic Community and the second is the wave of actions under United States Law which have resulted in spectacular awards for claims involving death or injury. In 1977 the average sum awarded resulting from court proceedings was $256000. Changes in the United Kingdom are inevitable but it is first necessary to review the current position.
David J Smith
20. A Case Study
Abstract
This chapter is a case study which has been used by the author, on Reliability and Maintainability Management and contract courses for nearly 10 years. It is not intended to represent any actual company, product or individuals.
David J Smith
21. Software and Reliability
Abstract
There has been a spectacular growth during the 1970s in the use of programmable devices. These are generally described as microprocessors and they have made a significant impact on methods of electronic circuit design. The main effect has been to reduce the number of different circuit types by the use of computer architecture coupled with software programming which provides the individual circuit features previously achieved by differences in hardware. The word software refers to any programme needed to enable a computer type device to function. This development of programming at the circuit level, now common with most industrial and consumer products, brings with it the associated quality and reliability problems. When applied to microprocessors at the circuit level the programming which is semi-permanent and usually contained in ROM (Read Only Memory) is known as Firmware. The necessary increase in function density of devices in order to provide the large quantities of memory in small packages has matched this trend. Computing and its associated software, is seen in three broad categories:
  • Mainframe computing — Isolated processing of large quantities of data and no interaction with real time events. Known as ‘data crunching’.
  • Minicomputing — Interactive processing where real time events are monitored and control of peripheral devices is provided.
David J Smith
Backmatter
Metadata
Title
Reliability and Maintainability in Perspective
Author
David J Smith, BSc, C.Eng, FIEE, FIQA
Copyright Year
1981
Publisher
Macmillan Education UK
Electronic ISBN
978-1-349-16649-7
Print ISBN
978-0-333-31049-6
DOI
https://doi.org/10.1007/978-1-349-16649-7