Skip to main content

2023 | Buch

Springer Handbook of Engineering Statistics

insite
SUCHEN

Über dieses Buch

This handbook gathers the full range of statistical techniques and tools required by engineers, analysts and scientists from all fields. The book is a comprehensive place to look for methods and solutions to practical problems within - but not limited to - data science, quality assurance in design and production engineering.

The tools of engineering statistics are relevant for modeling and prediction of products, processes and services, but also for the analysis of ongoing processes, the reliability and life-cycle assessment of products and services, and finally to achieve realistic predictions on how to improve processes and products.

This book contains contributions from around 115 leading experts in statistics, biostatistics, engineering statistics, reliability engineering, and related areas. It covers the various methods as well as their applications from industrial control to failure mechanism and analysis, medicine, business intelligence, electronic packaging, and risk management. It enables readers to choose the most appropriate method through its wide range of selection of statistical techniques and tools.

For the second edition all chapters have been thoroughly updated to reflect the current state-of-the-art. Included are also more than 30 completely new contriubutions revolving around current trends related to modern statistical computing, including: data science, big data, machine learning, optimization, data fusion, high dimensional data, voting systems, life testing, related statistical artificial intelligence (AI) and reliability physics and failure mode mechanisms.

This Springer Handbook of Engineering Statistics provides comprehensive literature with up-to-date statistical methodologies, algorithms, computation methods and tools that can be served as a main reference for researchers, engineers, business analysts, educators and students in all applied fields affected by statistics.

Inhaltsverzeichnis

Frontmatter
57. Correction to: Monitoring Coefficient of Variation Using CUSUM Control Charts
Phuong Hanh Tran, Huu Du Nguyen, Cédric Heuchenne, Kim Phuc Tran

Fundamental Statistics and Its Applications

Frontmatter
1. Basic Statistics

This chapter presents some fundamental elements of engineering probability and statistics with which some readers are probably already familiar, but others may not be. Statistics is the study of how best one can describe and analyze the data and then draw conclusions or inferences based on the data available. The first section of this chapter begins with some basic definitions, including probability axioms, basic statistics, and reliability measures.The second section describes the most common distribution functions such as the binomial, Poisson, geometric, exponential, normal, log normal, Student's t, gamma, Pareto, beta, Rayleigh, Cauchy, Weibull, Pham, and Vtub-shaped failure rate distributions, their applications, and their use in engineering and applied statistics.The third section describes statistical inference, including parameter estimation and confidence intervals. Statistical inference is the process by which information from sample data is used to draw conclusions about the population from which the sample was selected that hopefully represents the whole population. This discussion also introduces the maximum likelihood estimation (MLE) method, the method of moments, MLE with censored data, the statistical change-point estimation method, nonparametic tolerance limits, sequential sampling, and Bayesian methods.Finally, the last section provides a short list of books and articles for readers who are interested in advanced engineering and applied statistics.

Hoang Pham
2. Generalized Statistical Distributions

There has been an increased interest in developing generalized families of distributions by introducing additional shape parameters to a baseline cumulative distribution. This mechanism has proved to be useful to make the generated distributions more flexible especially for studying tail properties than existing distributions and for improving their goodness-of-fit statistics to the data under study. Let G(x) be the cumulative distribution function (CDF) of a baseline distribution and g(x) = dG(x)∕dx be the associated probability density function (PDF). We present generalized families with one and two additional shape parameters by transforming the CDF G(x) according to four important generators. These families are important for modeling data in several engineering areas. Many special distributions in these families are discussed by Tahir and Nadarajah (An Acad Bras Cienc 87(2):539–568, 2015).

Gauss M. Cordeiro, Artur Lemonte
3. Statistics for Reliability Modeling

This chapter provides a short summary of fundamental ideas in reliability theory and inference. The first part of the chapter accounts for lifetime distributions that are used in engineering reliability analysis, including general properties of reliability distributions that pertain to lifetime for manufactured products. Certain distributions are formulated on the basis of simple physical properties, and other are more or less empirical. The first part of the chapter ends with a description of graphical and analytical methods to find appropriate lifetime distributions for a set of failure data.The second part of the chapter describes statistical methods for analyzing reliability data, including maximum likelihood estimation (both parametric and nonparametric) and likelihood ratio testing. Degradation data are more prevalent in experiments in which failure is rare and test time is limited. Special regression techniques for degradation data can be used to draw inference on the underlying lifetime distribution, even if failures are rarely observed.The last part of the chapter discusses reliability for systems. Along with the components that comprise the system, reliability analysis must take account of the system configuration and (stochastic) component dependencies. System reliability is illustrated with an analysis of logistics systems (e.g., moving goods in a system of product sources and retail outlets). Robust reliability design can be used to construct a supply chain that runs with maximum efficiency or minimum cost.

Paul Kvam, Jye-Chyi Lu
4. Functional Data Analysis

This chapter introduces functional data analysis (FDA) and selective topics in FDA, including functional principal component analysis (FPCA) and functional linear regression (FLR), with real data applications using a software package, which is publicly available. The methods in this chapter are based on local polynomial regression, a basic and important smoothing technique in nonparametric and semiparametric statistics. The approaches included in this chapter are not limited to the analysis of dense functional data but can also be used for the analysis of sparse functional/longitudinal data.In Sect. 4.1, we introduce FDA with some interesting examples of functional data and briefly describe FPCA and FLR. Section 4.2 details FPCA, one of the most important topics and tools in FDA. Topics such as the estimation of mean and covariance functions using nonparametric smoothing, choosing the number of principal components (PC) using subjective and objective methods, and prediction of trajectories are included and illustrated using a publicly available bike-sharing data set. Section 4.3 presents FLR based on FPCA described in Sect. 4.2. FLR is a generalization of traditional linear regression to the case of functional data. It is a powerful tool to model the relationship between functional/scalar response and functional predictors. This section is also illustrated using the same bike-sharing data set. We focus on the case when both response and predictor are functions in this section, but we mentioned other types of FLR topics in Sect. 4.4. Section 4.4 presents a short overview of other selected topics and software packages in FDA. These topics are either about functional data with more complex features than the simple and basic ones included in the previous two sections or about other statistical estimation and inference not covered before. The statistical software packages used in this chapter are written in Matlab and may be appropriate for the analysis of some basic types of functional data but not for others. Section 4.4 described other software packages written in different languages, such as R, and those packages have the flexibility to analyze various problems in functional data and different types of functional data.

Yuhang Xu
5. Symmetric Geometric Skew Normal Regression Model

Recently, Kundu (2014, Sankhya, Ser. B, 167–189, 2014) proposed a geometric skew normal (GSN) distribution as an alternative to Azzalini’s skew normal (ASN) distribution. The GSN distribution can be a skewed distribution; it can be heavy tailed as well as multimodal also, unlike ASN distribution. It can be easily extended to the multivariate case also. The multivariate geometric skew normal (MGSN) distribution also has several desirable properties. In this paper, we have proposed a symmetric geometric skew normal (SGSN) distribution as an alternative to a symmetric distribution like normal distribution, log Birnbaum-Saunders (BS) distribution, Student’s t distribution, etc. It is a very flexible class of distributions, of which normal distribution is a special case. The proposed model has three unknown parameters, and it is observed that the maximum likelihood (ML) estimators of the unknown parameters cannot be obtained in explicit forms. In this paper, we have proposed a very efficient expectation maximization (EM) algorithm, and it is observed that the proposed EM algorithm works very well. We have further considered a location shift SGSN regression model. It is a more flexible than the standard Gaussian regression model. The ML estimators of the unknown parameters are obtained based on EM algorithm. Extensive simulation experiments and the analyses of two data sets have been presented to show the effectiveness of the proposed model and the estimation techniques.

Debasis Kundu, Deepak Prajapati
6. Statistical Analysis of Modern Reliability Data

Reliability analysis has been using time-to-event data, degradation data, and recurrent event data, while the associated covariates tend to be simple and constant over time. Over the past years, we have witnessed rapid development of sensor and wireless technology, which enables us to track the product usage and use environment. Nowadays, we are able to collect richer information on covariates which provides opportunities for better reliability predictions. In this chapter, we first review recent development on statistical methods for reliability analysis. We then focus on introducing several specific methods that were developed for different types of reliability data by utilizing the covariate information. Illustrations of those methods are also provided using examples from industry. We also provide a brief review on recent developments of test planning and then focus on illustrating the sequential Bayesian designs with examples of fatigue testing for polymer composites. The chapter is concluded with some discussions and remarks.

Yueyao Wang, I-Chen Lee, Lu Lu, Yili Hong
7. Mathematical Reliability Aspects of Multivariate Probability Distributions

This work deals with a comprehensive solution to the problem of finding the joint k-variate probability distributions of random vectors (X1, …,Xk), given all the univariate marginals. The general and universal analytic form of all solutions, given the fixed (but arbitrary) univariate marginals, was given in proven theorem. In order to choose among these solutions, one needs to determine proper “dependence functions” (joiners) that impose specific stochastic dependences among subsets of the set {X1, …,Xk} of the underlying random variables. Some methods of finding such dependence functions, given the fixed marginals, were discussed in our previous papers (Filus and Filus, J Stat Sci Appl 5:56–63, 2017; Filus and Filus, General method for construction of bivariate stochastic processes given two marginal processes. Presentation at 7-th International Conference on Risk Analysis, ICRA 7, Northeastern Illinois University, Chicago, 4 May 2017). In applications, such as system reliability modeling and other, among all the available k-variate solutions, one needs to choose those that may fit particular data, and, after that, test the chosen models by proper statistical methods. The theoretical aspect of the main model, given by formula (7.3) in Sect. 7.2, mainly relies on the existence of one [for any fixed set of univariate marginals] general and universal form which plays the role of paradigm describing the whole class of the k-variate probability distributions for arbitrary k = 2, 3, …. An important fact is that the initial marginals are arbitrary and, in general, each may belong to a different class of probability distributions. Additional analysis and discussion are provided.

Lidia Z. Filus, Jerzy K. Filus
8. Introduction to Stochastic Processes

This chapter briefly discusses stochastic processes, including Markov processes, Poisson processes, renewal processes, quasi-renewal processes, and nonhomogeneous Poisson processes. The chapter also provides a short list of books for readers who are interested in advanced topics in stochastic processes.

Hoang Pham
9. Progressive Censoring Methodology

Progressive censoring has received great attention in the last decades especially in life testing and reliability. This review highlights fundamental applications, related models, and probabilistic and inferential results for progressively censored data. Based on the fundamental models of progressive type I and type II censoring, we present related models like adaptive and hybrid censoring as well as, e.g., stress-strength and competing risk models for progressively censored data. Focusing on exponentially and Weibull distributed lifetimes, an extensive bibliography emphasizing recent developments is provided.

Narayanaswamy Balakrishnan, Erhard Cramer
10. Warranty Policies: Analysis and Perspectives

Warranty is a topic that has been studied extensively by different disciplines including engineering, economics, management science, accounting, and marketing researchers (Blischke and Murthy, Warranty cost analysis. Marcel Dekker, New York, 1994, p 47). Warranty policy is a guarantee for the seller of a product to provide the buyer with a specific service, such as replacement or repair, in the event of the product failure. Today warranty policy is an important marketing factor used by the manufacturers and corporates to promote its product to consumers (Park and Pham, IEEE Trans Syst Man Cybern A 40:1329–1340, 2010). This chapter aims to provide an overview on warranties focusing on the cost and benefit perspective of various warranty and maintenance policies. After a brief introduction of the current status of warranty research, the second part of this chapter classifies various existing and several recent promotional warranty policies to extend the taxonomy initiated by Blischke and Murthy (Eur J Oper Res 62:127–148, 1993).Focusing on the quantitative modeling perspective of both the cost and benefit analyses of warranties, we summarize five problems that are essential to warranty issuers. These problems are: (i) what are the warranty cost factors; (ii) how to compare different warranty policies; (iii) how to analyze the warranty cost of multi-component systems; (iv) how to evaluate the warranty benefits; (v) how to determine the optimal warranty policy.A list of future warranty research topics are presented in the last part of this chapter. We hope that this will stimulate further interest among researchers and practitioners.

Hoang Pham, Jun Bai

Process Monitoring and Improvement

Frontmatter
11. Statistical Methods for Quality and Productivity Improvement

The first section of this chapter introduces statistical process control (SPC) and robust design (RD), two important statistical methodologies for quality and productivity improvement. Section 11.1 describes in-depth SPC theory and tools for monitoring independent and autocorrelated data with a single quality characteristic. The relationship between SPC methods and automatic process control methods is discussed and differences in their philosophies, techniques, efficiencies, and design are contrasted. SPC methods for monitoring multivariate quality characteristics are also briefly reviewed.Section 11.2 considers univariate RD, with emphasis on experimental design, performance measures and modeling of the latter. Combined and product arrays are featured and performance measures examined, include signal-to-noise ratios SNR, PerMIAs, process response, process variance, and desirability functions. Of central importance is the decomposition of the expected value of squared-error loss into variance and off-target components which sometimes allows the dimensionality of the optimization problem to be reduced. Besides, this section deals with multivariate RD and demonstrates that the objective function for the multiple characteristic case is typically formed by additive or multiplicative combination of the univariate objective functions, and lists RD case studies originating from applications in manufacturing, reliability, and tolerance design.Section 11.3 discusses the mainstream methods used in the prognostics and health management (PHM) framework, including updated research from the literatures of both statistical science and engineering. Additionally, this section provides an overview of the systems health monitoring and management (SHMM) framework, discusses its basic structure, and lists several applications of SHMM to complex systems and to critical components within the context of a big data environment.

Wei Jiang, Terrence E. Murphy, Kwok-Leung Tsui, Yang Zhao
12. Chain Sampling

A brief introduction to the concept of chain sampling for quality inspection is first presented. The chain sampling plan of type ChSP-1 selectively chains the past inspection results. A discussion on the design and application of ChSP-1 plans is presented in the second section of this chapter. Various extensions of chain sampling plans such as ChSP-4 plan are discussed in the third part. Representation of the ChSP-1 plan as a two-stage cumulative results criterion plan and its design are discussed in the fourth part. The fifth section relates to the modification of ChSP-1 plan which results in sampling economy. The sixth section of this chapter is on the relationship between chain/dependent sampling and deferred sentencing type of plans. A review of sampling inspection plans that are based on the ideas of chain or dependent sampling or deferred sentencing is also made in this section. A large number of recent publications based on the idea of chaining past and future lot results are also reviewed. The economics of chain sampling when compared to the two-plan quick switching system is discussed in the seventh section. The eighth section extends the attribute chain sampling rule to variables inspection. In the ninth section, chain sampling is compared with the well-known CUSUM approach for attribute data. The tenth section gives several other interesting extensions such as chain sampling for mixed inspection and process control. The final section gives the concluding remarks.

Govindaraju Kondaswamy
13. Six Sigma

Six Sigma, which was first launched by Motorola in the late 1980s, has become a successful standard-quality initiative to achieve and maintain excellent business performance in today’s manufacturing and service industries. In this chapter, we provide a systematic and principled introduction of Six Sigma from its various facets. The first part of this chapter describes what Six Sigma is, why we need Six Sigma, and how to implement Six Sigma in practice. A typical business structure for Six Sigma implementation is introduced, and potential failure modes of Six Sigma are also discussed.The second part describes the core methodology of Six Sigma, which consists of five phases, i.e., Define, Measure, Analyze, Improve, and Control (DMAIC). Specific operational steps in each phase are described in sequence. Key tools to support the DMAIC process including both statistical tools and management tools are also presented. The third part highlights a specific Six Sigma technique for product development and service design, Design for Six Sigma (DFSS), which is different from DMAIC. DFSS also has five phases: Define, Measure, Analyze, Design, and Verify (DMADV), spread over product development. Each phase is described, and the corresponding key tools to support each phase are presented.In the fourth part, a real case study on printed circuit board (PCB) improvement is used to demonstrate the application of Six Sigma. The company and process background are provided. The DMAIC approach is specifically followed, and key supporting tools are illustrated accordingly. At the end, the financial benefit of this case is realized through the reduction of cost of poor quality (COPQ). The fifth part provides a discussion of Six Sigma in current Big Data background. A brief introduction of Big Data is first given, and then the tremendous opportunities offered by Big Data analytics to the core methodology of Six Sigma, i.e., DMAIC, are outlined in detail. The capabilities of each phase that would be greatly enhanced are emphasized. Finally, the last part is given to conclusions and a discussion of prospects of Six Sigma.

Fugee Tsung, Kai Wang
14. Statistical Models for Monitoring the High-Quality Processes

One important application of statistical models in the industry is statistical process control. Many control charts have been developed and used in the industry. They are easy to use but have been developed based on statistical principles. However, for today’s high-quality processes, traditional control-charting techniques are not applicable in many situations. Research has been going on in the last few decades, and new methods have been proposed. This chapter summarizes some of these techniques.High-quality processes are generally defined as those with very low defective rate or defect-occurrence rate, which is achieved in six sigma environment and in the advanced manufacturing environment. Control charts based on the cumulative count of conforming items are recommended for such processes. The use of such charts has opened up new frontiers in the research and applications of statistical control charts in general. In this chapter, several extended or modified statistical models are described. They are useful when the simple and basic geometric distribution is not appropriate or is insufficient.In particular, we present some extended Poisson distribution models that can be used for count data with large numbers of zero counts. We also extend the chart to the case of general time-between-events monitoring; such an extension can be useful in service or reliability monitoring. Traditionally, the exponential distribution is used for the modeling of time-between-events, although other distributions such as the Weibull or gamma distribution can also be used in this context.

Min Xie, Thong Ngee Goh, Tahir Mahmood
15. Statistical Management and Modeling for Demand Spare Parts

In recent years, increased emphasis has been placed on improving decision-making in business and government. A key aspect of decision-making is being able to predict the circumstances surrounding individual decision situations. Examining the diversity of requirements in planning and decision-making situations, it is clearly stated that no single forecasting methods or narrow set of methods can meet the needs of all decision-making situations. Moreover, these methods are strongly dependent on factors, such as data quantity, pattern, and accuracy, that reflect their inherent capabilities and adaptability, such as intuitive appeal, simplicity, ease application, and, least but not last, cost.Section 15.1 deals with the placement of demand forecasting problem as one of biggest challenge in the repair and overhaul industry; after this brief introduction, Sect. 15.2 summarizes the most important categories of forecasting methods; paragraphs from 15.3 to 15.4 approach the forecast of spare parts first as a theoretical construct, but some industrial applications and results are added from field training, as in many other parts of this chapter.Section 15.5 undertakes the question of optimal stock level for spare parts, with particular regards to Low Turnaround Index (LTI) parts conceived and designed for the satisfaction of a specific customer request, by the application of classical Poisson methods of minimal availability and minimum cost; similar considerations are drawn and compared in Sect. 15.6 dealing with models based on binomial distribution. An innovative extension of binomial models based on total cost function is discussed in Sect. 15.7. Finally, Sect. 15.8 adds the Weibull failure rate function to LTI spare parts stock level in maintenance system with declared wear conditions.

Emilio Ferrari, Arrigo Pareschi, Alberto Regattieri, Alessandro Persona
16. D-Efficient Mixed-Level Foldover Designs for Screening Experiments

Definitive screening design (DSD) is a new class of three-level screening designs proposed by Jones and Nachtsheim [3] which only requires 2m + 1 runs for experiments with m three-level quantitative factors. The design matrices for DSDs are of the form (C′,  −C′, 0)′ where C is a (0, ±1) submatrix with zero diagonal and 0 is a column vector of 0’s. This paper reviews recent development on D-efficient mixed-level foldover designs for screening experiments. It then discusses a fast coordinate-exchange algorithm for constructing D-efficient DSD-augmented designs (ADSDs). This algorithm provides a new class of conference matrixConference matrix-based mixed-level foldover designs (MLFODs) for screening experiments as introduced by Jones and Nachtsheim [4]. In addition, the paper also provides an alternative class of D-efficient MLFODs and an exhaustive algorithm for constructing the new designs. A case study comparing two candidate MLFODs for a large mixed-level screening experiment with 17 factors used is used to demonstrate the properties of the new designs.

Nam-Ky Nguyen, Ron S. Kenett, Tung-Dinh Pham, Mai Phuong Vuong
17. Censored Data Prediction Based on Model Selection Approaches

The contribution of this chapter is to solve two common problems in analysis. The first contribution is that we propose two methods to predict censored data. The second contribution is to automatically select the most suitable distribution function instead of subjective judgment. In this chapter, we propose three approaches of model selection. To demonstrate our approach, two members in the location-scale family, the normal distribution and smallest extreme value distribution, are used as candidates to illustrate the best model competition for the underlying distribution via using the proposed prediction methods. According to the result of Monte Carlo simulations, model misspecification has impact on the prediction precision and the proposed three model selection approaches perform well when more than one candidate distributions are competing for the best underlying model. Finally, the proposed approaches are applied to three data sets. This chapter is based on Chiang et al. (Math Probl Eng, 3465909, 2018).

Tzong-Ru Tsai, Jyun-You Chiang, Shuai Wang, Yan Qin
18. Monitoring Coefficient of Variation Using CUSUM Control Charts

In the field of statistical process control, the cumulative sum (CUSUM) control chart is used as a powerful tool to detect process shifts. One of the main features of the CUSUM control chart is that it takes into account the past information at each sampling time of the process. Recently, the rapid development of optimization algorithms and software makes the CUSUM chart easier to be implemented. As a result, the CUSUM control chart has been increasingly applied widely. The goal of this chapter is to present some recent innovative CUSUM control charts monitoring the coefficient of variation (CV). We address several problems related to the CUSUM chart monitoring the CV. The first section provides important definitions of a CUSUM control chart, including the CUSUM sequence, the CUSUM statistics, the implementation of a CUSUM control chart, the average run length (ARL), and the expected average run length (EARL). In the second section, we investigate the effect of measurement error on the CUSUM control chart CUSUM control chartmeasurement error onmonitoring the CV. Finally, a fast initial response strategy to improve the performance ofCUSUM control chartperformance of the CUSUM control chart is presented.

Phuong Hanh Tran, Huu Du Nguyen, Cédric Heuchenne, Kim Phuc Tran
19. Change-Point-Based Statistical Process Controls

In the current era of computers, statistical monitoring of sequential observations is an important research area. In problems such as monitoring the quality of industrial products, health variables, climatological variables, etc., we are often interested in detecting a change in the process distribution in general, not just in mean or variance. We first briefly discuss a few commonly used SPC charts along with relevant references and then present a new chart for univariate continuous processes. Unlike most SPC charts in the literature, it neither assumes any “in-control” probability distribution nor requires any “in-control” Phase I data, and it aims to detect arbitrary distributional change. This chart uses a computationally efficient method to find the possible change-point. Moreover, the proposed chart uses a p-value-based data pruning approach to further increase the efficiency, and it combines the strengths of two different tests of hypotheses, which has a potentially broad application. Numerical simulations and two real-data analyses show that the chart can be used in various monitoring problems when the nature of distributional change is unknown.

Partha Sarathi Mukherjee

Reliability Models and Survival Analysis

Frontmatter
20. Reliability Characteristics and Optimization of Warm Standby Systems

This chapter presents reliability characteristics and optimal redundancy allocation of k-out-of-n warm standby systems consisting of identical components having exponential time to failure distributions. It is shown that the state probabilities of the warm standby system can be represented using the formulas that are applicable for active redundancy system. Subsequently, it is shown that all properties and computational procedures that are applicable for active redundancy are also applicable for the warm standby redundancy. The new results prove that the system reliability can be computed using robust and efficient computational algorithms with O $$\mathcal {O}$$ (n-k + 1) time complexity. Further, it is proved that the time-to-failure distribution of k-out-of-n warm standby system is beta-exponential. Using this property, closed-form expressions for various reliability characteristics and statistical measures of system failure time are presented. It has shown that the system reliability function is log-concave in n and this property is used to find efficient algorithms for determining optimal system configurations. Because active redundancy is a special case of the warm standby redundancy, indirectly this chapter also provides some new results for the active redundancy case as well.

Suprasad V. Amari, Hoang Pham
21. Importance and Sensitivity Analysis on Complex Repairable Systems and Imprecise System Reliability

The identification of the components which are responsible for the performance of system is a key task to improve the safety and reliability. However, such analysis is difficult and challenging when large and complex systems are analyzed. The survival signature which is presented recently cannot only hold the merits of the former system signature but is efficient to deal with complex system with multiply component types. In real engineering applications, components can be repaired after failure. Hence, it is essential to identify which component or component set is most critical the complex repairable system. What is more, due to lack of data or the confidential information, it is difficult to know the full configuration of the system, which leads to an imprecise survival signature. In order to address the above questions, the efficient simulation approaches based on structure function and survival signature have been proposed respectively to analyze the complex repairable systems. Based on this, component importance index has been introduced to perform sensitivity analysis on a specific component or a set of components within the repairable system. In addition, the proposed simulation method can be used to deal with imprecision within the survival signature. Numerical examples are presented to show the applicability of the above approaches.

Geng Feng
22. Hardware and Software Reliability, Verification, and Testing

Hardware and software together are now part and parcel of almost all the modern devices. Hence, the study of both hardware reliability and software reliability has become very important in order to ensure availability of the devices. There are several distinct differences between hardware and software, and hence, even though the definition of reliability in both the cases remains the same, finding out reliability of a hardware may call for a different methodology than that for a software. Since a software cannot be seen, nor can it be touched, finding out reliability of a software becomes difficult as such. In this chapter we discuss in brief the concepts and methodologies adopted to find out reliabilities of software and hardware. We also discuss some basic differences between hardware and software. A few important methods used for estimating hardware and software reliability have been discussed in brief. A thorough bibliography has been provided for the readers to look into the details of the methodologies wherever required.

Ashis Kumar Chakraborty, E. V. Gijo, Anisha Das, Moutushi Chatterjee
23. OSS Reliability Analysis and Project Effort Estimation Based on Computational Intelligence

OSS (open-source software) systems serve as the key components of critical infrastructures in the society. As for the OSS development paradigm, the bug tracking systems are used for software quality management in many OSS projects. It is important to appropriately control the quality for the progress status of OSS project, because the software failure is caused by the poor handling of effort control. In particular, the GUI of OSS will be frequently made a dramatic difference according to the major version upgrade. The changing in GUI of OSS will depend on the development and management effort of OSS in the specified version. Considering the relationship between GUI and OSS development process, the UX/UI design of OSS will change with the procedure of OSS development. This chapter focuses on the method of effort estimation for OSS project. Then, the pixel data and OSS fault big data are analyzed by using the deep learning. Moreover, we discuss the effort assessment method in the development phase by using the effort data.

Shigeru Yamada, Yoshinobu Tamura, Kodai Sugisaki
24. Vulnerability Discovery Analysis in Software Reliability and Related Optimization Problems

The recent rapid advancement in technology has affected the security of software products. The number of threats and cyber-attacks are intensifying both in number and in complexity. Therefore, software system requires protection against threats and vulnerabilities. When defects in the software have an effect on the security of the software system, then these defects are called vulnerabilities. It is essential for vendors to rigorously identify and remove vulnerabilities present in the system. This chapter aims to explain the vulnerability discovery and patching process mathematically. Patch is a security update released by software developers to eliminate vulnerabilities from the system. Quantitative measures are discussed in the present study to predict the vulnerability discovery growth function by incorporating various attributes, namely, software users, operational effort, and coverage functions. Joint optimization problem for optimal software and patch time-to-market are also discussed with an aim of minimizing the cost functions. Numerical examples are provided to validate the mathematical models and minimization problem using the actual vulnerability data sets. The results indicate that the discussed models can objectively determine the vulnerability discovery paradigm. Moreover, the optimization models will assist the management team in optimal decision making pertaining to release time of software and security patch in the market.

P. K. Kapur, Saurabh Panwar
25. Software Reliability Modeling and Prediction

After a brief overview of existing models in software reliability in Sects. 25.1 and 25.2 discusses a generalized nonhomogeneous Poisson process model that can be used to derive most existing models in the software reliability literature. Section 25.3 describes a generalized random field environment (RFE) model incorporating both the testing phase and operating phase in the software development cycle for estimating the reliability of software systems in the field. In contrast to some existing models that assume the same software failure rate for the software testing and field operation environments, this generalized model considers the random environmental effects on software reliability. Based on the generalized RFE model, Sect. 25.4 describes two specific RFE reliability models, the γ-RFE and β-RFE models, for predicting software reliability in field environments. Section 25.5 illustrates the models using telecommunication software failure data. Some further considerations based on the generalized software reliability model are also discussed.

Hoang Pham, Xiaolin Teng
26. Statistical Maintenance Modeling

The first part of this chapter provides a brief introduction to statistical maintenance modeling subject to multiple failure processes. It includes a description of general probabilistic degradation processes.The second part discusses detailed reliability modeling for degraded systems subject to competing failure processes without maintenance actions. A generalized multi-state degraded-system reliability model with multiple competing failure processes including degradation processes and random shocks is presented. The operating condition of the multi-state system is characterized by a finite number of states. A methodology to generate the system states when multi-failure processes exist is also discussed. The model can be used not only to determine the reliability of the degraded systems in the context of multi-state functions but also to obtain the probabilities of being in a given state of the system.The third part describes the inspection–maintenance issues and reliability modeling for degraded repairable systems with competing failure processes. A generalized condition-based maintenance model for inspected degraded systems is discussed. An average long-run maintenance cost rate function is derived based on an expression for degradation paths and cumulative shock damage, which are measurable. An inspection sequence is determined based on the minimal maintenance cost rate. Upon inspection, a decision will be made on whether to perform preventive maintenance or not. The optimum preventive maintenance thresholds for degradation processes and inspection sequences are also determined based on a modified Nelder–Mead downhill simplex method.The fourth part briefly discusses some dependent competing risk models with various applications subject to multiple degradation processes and random shocks especially using time-varying copulas.Finally, the last part is given over to the conclusions and a discussion of future perspectives for degraded-system maintenance modeling.

Hoang Pham, Wenjian Li
27. Continuous-Time Predictive Maintenance Modeling with Dynamic Decision Framework

Digital technologies improve the information collected on systems and allow the development of condition-based maintenance policies and models using the remaining useful life. Accordingly, maintenance policies have evolved from a simple time-based to a more complex and competitive predictive approach. However, considering a dynamic maintenance decision framework with a self-adaptive decision rule has not been thoroughly addressed. This chapter deals with continuously deteriorating systems and focuses on dynamic maintenance policies, i.e., policies using real-time information to update the decision rule and handle the model’s uncertainty. The first part presents popular stochastic processes for degradation modeling and condition-based maintenance decision rule. Then, dynamic maintenance policies are described in two different contexts: for groupings of maintenance actions and for reducing uncertainty in modeling. Finally, a particular case of dynamic preventive maintenance model is described in detail for a system with continuous degradation and unknown degradation parameters. It is based on the inverse Gaussian process with a nonperiodic inspection policy and includes parameters update.

Antoine Grall, Elham Mosayebi Omshi
28. Stochastic Redundant Replacement Maintenance Models

Many serious accidents have happened as systems have become large scale and complex, and moreover, advanced nations have almost finished infrastructures and rushed into a maintenance period. Maintenance would be more important than production and construction for environment consideration and the protection of natural resources. A variety of maintenance policies have been established to prevent failures for objective systems in reliability theory. It has been well-known that high system reliability can be achieved through redundancy. Alternatively, several maintenance policies are planned simultaneously, such as bivariate, trivariate, and multivariate policies with multiple maintenance plans. This chapter takes up age and periodic replacements that are the most standard maintenance policy, shows their optimal policies, and proposes redundant replacement policies with time T and n kinds of replacements. The results obtained in this chapter would be applied to maintenances for redundant systems, imperfect repair, and several failure nodes.

Toshio Nakagawa, Satoshi Mizutani, Xufeng Zhao

Advanced Statistical Methods and Modeling

Frontmatter
29. Confidence Distribution and Distribution Estimation for Modern Statistical Inference

This chapter introduces to readers the new concept and methodology of confidence distribution and the modern-day distributional inference in statistics. This discussion should be of interest to people who would like to go into the depth of the statistical inference methodology and to utilize distribution estimators in practice. We also include in the discussion the topic of generalized fiducial inference, a special type of modern distributional inference, and relate it to the concept of confidence distribution. Several real data examples are also provided for practitioners. We hope that the selected content covers the greater part of the developments on this subject.

Yifan Cui, Min-ge Xie
30. Logistic Regression Tree Analysis

Ordinary logistic regression (OLR) models the probability of a binary outcome. A logistic regression tree (LRT) is a machine learning method that partitions the data and fits an OLR model in each partition. This chapter motivates LRT by highlighting the challenges of OLR with respect to model selection, interpretation, and visualization on a completely observed dataset. Being nonparametric, a LRT model typically has higher prediction accuracy than OLR for large datasets. Further, by sharing model complexity between the tree structure and the OLR node models, the latter can be made simple for easier interpretation and visualization.OLR is more challenging if there are missing values in the predictor variables, because imputation must be carried out first. The second part of the chapter reviews the GUIDE method of constructing LRT models. A strength of GUIDE is its ability to deal with large numbers of variables and without the need to impute missing values. This is demonstrated on a vehicle crash-test dataset for which imputation is difficult due to missing values and other problems.

Wei-Yin Loh
31. Detecting Outliers and Influential and Sensitive Observations in Linear Regression

This chapter reviews diagnostic and robust procedures for detecting outliers and other interesting observations in linear regression. First, we present statistics for detecting single outliers and influential observations and show their limitations for multiple outliers in high-leverage situations. Second, we discuss diagnostic procedures designed to avoid masking by finding first a clean subset for estimating the parameters and then increasing its size by incorporating, one by one, new homogeneous observations until a heterogeneous observation is found. We also discuss procedures based on sensitive observations for detecting high-leverage outliers in large data sets using the eigenvectors of a sensitivity matrix. We briefly review robust estimation methods and its relationship with diagnostic procedures. Next, we consider large high-dimensional data sets where the application of iterative procedures can be slow and show that the joint use of simple univariate statistics, as predictive residuals, Cook’s distances, and Peña’s sensitivity statistic, can be a useful diagnostic tool. We also comment on other recent procedures based on regularization and sparse estimation and conclude with a brief analysis of the relationship of outlier detection and cluster analysis. A real data and a simulated example are presented to illustrate the procedures presented in the chapter.

Daniel Peña
32. Statistical Methodologies for Analyzing Genomic Data

The purpose of this chapter is to describe and review a variety of statistical issues and methods related to the analysis of microarray data. In the first section, after a brief introduction of the DNA microarray technology in biochemical and genetic research, we provide an overview of four levels of statistical analyses. The subsequent sections present the methods and algorithms in detail.In the second section, we describe the methods for identifying significantly differentially expressed genes in different groups. The methods include fold change, different t-statistics, empirical Bayesian approach, and significance analysis of microarrays (SAM). We further illustrate SAM using a publicly available colon cancer dataset as an example. We also discuss multiple comparison issues and the use of false discovery rate.In the third section, we present various algorithms and approaches for studying the relationship among genes, particularly clustering and classification. In clustering analysis, we discuss hierarchical clustering, and k-means and probabilistic model-based clustering in detail with examples. We also describe the adjusted Rand index as a measure of agreement between different clustering methods. In classification analysis, we first define some basic concepts related to classification. Then we describe four commonly used classification methods including linear discriminant analysis (LDA), support vector machines (SVM), neural network, and tree-and-forest-based classification. Examples are included to illustrate SVM and tree-and-forest-based classification.The fourth section is a brief description of the meta-analysis of microarray data in three different settings: meta-analysis of the same biomolecule and same platform microarray data, meta-analysis of the same biomolecule but different platform microarray data, and meta-analysis of different biomolecule microarray data.We end this chapter with final remarks on future prospects of microarray data analysis.

Fenghai Duan, Heping Zhang
33. Genetic Algorithms and Their Applications

The first part of this chapter describes the foundation of genetic algorithms. It includes hybrid genetic algorithms, adaptive genetic algorithms, and fuzzy logic controllers. After a short introduction to genetic algorithms, the second part describes combinatorial optimization problems including the knapsack problem, the minimum spanning tree problem, the set-covering problem, the bin-packing problem, and the traveling-salesman problem; these are combinatorial optimization problems which are characterized by a finite number of feasible solutions. The third part describes network design problems. Network design and routing are important issues in the building and expansion of computer networks. In this part, the shortest-path problem, maximum-flow problem, minimum-cost-flow problem, centralized network design, and multistage process planning problem are introduced. These problems are typical network problems and have been studied for a long time. The fourth section describes scheduling problems. Many scheduling problems from manufacturing industries are quite complex in nature and very difficult to solve by conventional optimization techniques. In this part the flow-shop sequencing problem, job-shop scheduling, the resource-constrained project scheduling problem, and multiprocessor scheduling are introduced. The fifth part introduces the reliability design problem, including simple genetic algorithms for reliability optimization, reliability design with redundant units and alternatives, network reliability design, and tree-based network topology design. The sixth part describes logistic problems including the linear transportation problem, the multiobjective transportation problem, the bicriteria transportation problem with fuzzy coefficients, and supply chain management network design. Finally, the last part describes location and allocation problems including the location-allocation problem, capacitated plant-location problem, and obstacle location-allocation problem.

Mitsuo Gen, Lin Lin
34. Deterministic and Stochastic DCA for DC Programming

In the context of big data analysis, stochastic optimization algorithms are widely used as effective tools to handle data complexity and data uncertainty. These algorithms usually aim to solve problems modeled as stochastic programs. Some of these problems admit nonconvex objective functions. On the other hand, DCA (difference of convex functions algorithm) has proven its strength in tackling a large class of smooth or nonsmooth, nonconvex optimization problems called DC programming. The key advantages of DCA come from its simplicity and flexibility that allows it to treat large-scale problems arising in various contexts. This chapter concerns methods incorporating ideas of stochastic optimization in an online manner into DCA framework to create new algorithms called online stochastic DCA. The first section introduces the chapter. The second section accounts for deterministic DC programming and DCA. The third section briefly reviews stochastic optimization. The fourth section is dedicated to stochastic DC programming and DCA, where we propose two online stochastic DCA schemes for solving a class of stochastic DC programs. The last section concludes the chapter with discussions about promising aspects of the topic.

Hoai An Le Thi, Tao Pham Dinh, Hoang Phuc Hau Luu, Hoai Minh Le
35. Inference for Coherent Systems with Weibull Components Under a Simple Step-Stress Model

Coherent systems are widely studied in reliability experiments. Under the assumption that the components of a coherent system follow a two-parameter Weibull distribution, maximum likelihood inference for n-component coherent systems with known signatures under a simple step-stress model is discussed in this paper. The detailed steps of the stochastic expectation maximization algorithm under this setup are also developed to obtain estimates of the model parameters. Asymptotic confidence intervals for the model parameters are constructed using the observed Fisher information matrix and missing information principle. Parametric bootstrap approach is used also to construct confidence intervals for the parameters. A method based on best linear unbiased estimators is developed to provide initial values that are needed for numerical computation of maximum likelihood estimates. The performance of the methods developed is assessed through an extensive Monte Carlo simulation study. Finally, two numerical examples are presented for illustrative purpose.

Narayanaswamy Balakrishnan, Debanjan Mitra, Xiaojun Zhu
36. Bivariate Distributions with Singular Components

In this chapter we mainly discuss classes of bivariate distributions with singular components. It is observed that there are mainly two different ways of defining bivariate distributions with singular components, when the marginals are absolutely continuous. Most of the bivariate distributions available in the literature can be obtained from these two general classes. A connection between the two approaches can be established based on their copulas. It is observed that under certain restrictions both these classes have very similar copulas. Several properties can be established of these proposed classes. It is observed that the maximum likelihood estimators (MLEs) may not always exist; whenever they exist, they cannot be obtained in closed forms. Numerical techniques are needed to compute the MLEs of the unknown parameters. Alternatively, very efficient expectation maximization (EM) algorithm can be used to compute the MLEs. The corresponding observed Fisher information matrix also can be obtained quite conveniently at the last stage of the EM algorithm, and it can be used to construct confidence intervals of the unknown parameters. The analysis of one data set has been performed to see the effectiveness of the EM algorithm. We discuss different generalizations, propose several open problems, and finally conclude the chapter.

Debasis Kundu
37. Bayesian Models

Bayesian modelling has come a long way from the first appearance of the Bayes theorem. Now it is being applied in almost every scientific field. Scientists and practitioners are choosing to use Bayesian methodologies over the classical frequentist framework because of its rigour mathematical framework and the ability to combine prior information to define a prior distribution on the possible values of the unknown parameter. Here in this chapter we briefly discuss various aspects of Bayesian modelling. Starting from a short introduction on conditional probability, the Bayes theorem, different types of prior distributions, hierarchical and empirical Bayes and point and interval estimation, we describe Bayesian regression modelling with more detail. Then we mention an array of Bayesian computational techniques, viz. Laplace approximations, E-M algorithm, Monte Carlo sampling, importance sampling, Markov chain Monte Carlo algorithms, Gibbs sampler and Metropolis-Hastings algorithm. We also discuss model selection tools (e.g. DIC, WAIC, cross-validation, Bayes factor, etc.) and convergence diagnostics of the MCMC algorithm (e.g. Geweke diagnostics, effective sample size, Gelman-Rubin diagnostic, etc.). We end the chapter with some applications of Bayesian modelling and discuss some of the drawbacks in using Bayesian modelling in practice.

Ashis Kumar Chakraborty, Soumen Dey, Poulami Chakraborty, Aleena Chanda

Statistical Computing and Data Mining

Frontmatter
38. Data Mining Methods and Applications

In this chapter, we provide a review of the knowledge discovery process, including data handling, data mining methods and software, and current research activities. The introduction defines and provides a general background to data mining knowledge discovery in databases, following by an outline of the entire process in the second part. The third part presents data handling issues, including databases and preparation of the data for analysis. The fourth part, as the core of the chapter, describes popular data mining methods, separated as supervised versus unsupervised learning. Supervised learning methods are described in the context of both regression and classification, beginning with the simplest case of linear models, then presenting more complex modeling with trees, neural networks, and support vector machines, and concluding with some methods only for classification. Unsupervised learning methods are described under two categories: association rules and clustering. The fifth part presents past and current research projects, involving both industrial and business applications. Finally, the last part provides a brief discussion on remaining problems and future trends.

Kwok-Leung Tsui, Victoria Chen, Wei Jiang, Fangfang Yang, Chen Kan
39. Statistical Methods for Tensor Data Analysis

This book chapter provides a brief introduction of tensors and a selective overview of tensor data analysis. Tensor data analysis has been an increasingly popular and also a challenging topic in multivariate statistics. In this book chapter, we aim to review the current literature on statistical models and methods for tensor data analysis.

Qing Mai, Xin Zhang
40. Random Forests for Survival Analysis and High-Dimensional Data

One of the most commonly encountered problems in biomedical studies is analyzing censored survival data. Survival analysis differs from standard regression problems by one central feature: the event of interest may not be fully observed. Therefore, statistical methods used to analyze this data must be adapted to handle the missing information. In this chapter, we provide a brief introduction of right-censored survival data and introduce survival random forest models for analyzing them. Random forests are among the most popular machine learning algorithms. During the past decade, they have seen tremendous success in biomedical studies for prediction and decision-making. In addition to the statistical formulation, we also provide details of tuning parameters commonly considered in practice. An analysis example of breast cancer relapse free survival data is used as a demonstration. We further introduce the variable importance measure that serves as a variable selection tool in high-dimensional analysis. These examples are carried out using a newly developed R package RLT, which is available on GitHub.

Ruoqing Zhu, Sarah E. Formentini, Yifan Cui
41. Probability Inequalities for High-Dimensional Time Series Under a Triangular Array Framework

Study of time series data often involves measuring the strength of temporal dependence, on which statistical properties like consistency and central limit theorem are built. Historically, various dependence measures have been proposed. In this note, we first survey some of the most well-used dependence measures as well as various probability and moment inequalities built upon them under a high-dimensional triangular array time series setting. We then argue that this triangular array setting will pose substantially new challenges to the verification of some dependence conditions. In particular, “textbook results” could now be misleading and hence are recommended to be used with caution.

Fang Han, Wei Biao Wu
42. Statistical Machine Learning

We are living in the golden era of machine learning as it has been deployed in various applications and fields. It has become the statistical and computational principle for data processing. Despite the fact that most of the existing algorithms in machine learning have been around for decades, the area is still booming. Machine learning aims to study the theories and algorithms in statistics, computer science, optimization, and their interplay with each other. This chapter provides a comprehensive review of past and recent state-of-the-art machine learning techniques and their applications in different domains. We focus on practical algorithms of various machine learning techniques and their evolutions. An in-depth analysis and comparison based on the main concepts are presented. Different learning types are studied to investigate each technique’s goals, limitations, and advantages. Moreover, a case study is presented to illustrate the concepts explained and make a practical comparison. This chapterl helps researchers understand the challenges in this area, which can be turned into future research opportunities, and at the same time gain a core understanding of the most recent methodologies in machine learning.

Maryam Arabzadeh Jamali, Hoang Pham
43. Covariance Estimation via the Modified Cholesky Decomposition

In many engineering applications, estimation of covariance and precision matrices is of great importance, helping researchers understand the dependency and conditional dependency between variables of interest. Among various matrix estimation methods, the modified Cholesky decomposition is a commonly used technique. It has the advantage of transforming the matrix estimation task into solving a sequence of regression models. Moreover, the sparsity on the regression coefficients implies certain sparse structure on the covariance and precision matrices. In this chapter, we first overview the Cholesky-based covariance and precision matrices estimation. It is known that the Cholesky-based matrix estimation depends on a prespecified ordering of variables, which is often not available in practice. To address this issue, we then introduce several techniques to enhance the Cholesky-based estimation of covariance and precision matrices. These approaches are able to ensure the positive definiteness of the matrix estimate and applicable in general situations without specifying the ordering of variables. The advantage of Cholesky-based estimation is illustrated by numerical studies and several real-case applications.

Xiaoning Kang, Zhiyang Zhang, Xinwei Deng
44. Statistical Learning

One of the main goals of statistical learning is to characterize how the excess risk depends on the sample size n, on the complexity of the hypothesis class, and on the underlying complexity of the prediction problem itself. A related problem is to control the generalization error, which is a measure of how accurately an algorithm is able to predict outcome values for previously unseen data. Establishing probability error bounds for these problems can be converted into a problem of uniform convergence. We first introduce some commonly used technical tools for uniform convergence. Along the way, we highlight the recent development of learning theory for deep neural networks (DNNs) and explain the theoretical benefit to improve the generalization error in practice. Furthermore, we present the generalization of DNNs for robust adversarial learning with ℓ∞ attacks. For general machine learning tasks, we show that adversarial Rademacher complexity is always larger than natural counterpart, but the effect of adversarial perturbations can be limited under the weight normalization framework.

Qingyi Gao, Xiao Wang
45. Bayesian Survival Analysis in the Presence of Monotone Likelihoods

The monotone likelihood problem is often encountered in the analysis of time-to-event data under a parametric regression model or a Cox proportional hazards regression model when the sample size is small or the events are rare. For example, with a binary covariate, the subjects can be divided into two groups. If the event of interest does not occur (zero event) for all subjects in one of the groups, the resulting likelihood function is monotonic and consequently the covariate effects are difficult to estimate. In this chapter, we carry out an in-depth examination of the conditions of the monotone likelihood problem under a parametric regression model and the partial likelihood under the Cox proportional hazards regression model. We review and discuss Bayesian approaches to handle the monotone likelihood and partial likelihood problems. We analyze the test data from a tire reliability study in details.

Jing Wu, Mário de Castro, Ming-Hui Chen
46. Multivariate Modeling with Copulas and Engineering Applications

This chapter reviews multivariate modeling using copulas with illustrative applications in engineering such as multivariate process control and degradation analysis. A copula separates the dependence structure of a multivariate distribution from its marginal distributions. Properties and statistical inferences of copula-based multivariate models are discussed in detail. Applications in engineering are illustrated via examples of bivariate process control and degradation analysis, using existing data in the literature. An R package copula facilitates developments and applications of copula-based methods. The major change from the last version (Yan 2006) is the update on the R package copula (Hofert et al. 2018).Section 46.1 provides the background and motivation of multivariate modeling with copulas. Most multivariate statistical methods are based on the multivariate normal distribution, which cannot meet the practical needs to fit non-normal multivariate data. Copula-based multivariate distributions offer much more flexibility in modeling various non-normal data. They have been widely used in insurance, finance, risk management, and medical research. This chapter focuses on their applications in engineering.Section 46.2 introduces the concept of copulas and its connection to multivariate distributions. The most important result about copulas is Sklar’s (1959) theorem which shows that any continuous multivariate distribution has a canonical representation by a unique copula and all its marginal distributions. Scale-invariant dependence measures for two variables, such as Kendall’s tau and Spearman’s rho, are completely determined by their copula. The extremes of these two concordance measures, − 1 and 1, are obtained under perfect dependence, corresponding to the Fréchet-Hoeffding lower and upper bounds of copulas, respectively. A general algorithm to simulate random vectors from a copula is also presented.Section 46.3 introduces two commonly used classes of copulas: elliptical copulas and Archimedean copulas. Elliptical copulas are copulas of elliptical distributions. Two most widely used elliptical copulas, the normal copula and the t copula, are discussed. Archimedean copulas are constructed without referring to distribution functions and random variables. Three popular Archimedean families, Clayton copula, Frank copula, and Gumbel copula, each having a mixture representation with a known frailty distribution, are discussed. Simulation algorithms are also presented.Section 46.4 presents the maximum likelihood inference of copula-based multivariate distributions given the data. Three likelihood approaches are introduced. The exact maximum likelihood approach estimates the marginal and copula parameters simultaneously by maximizing the exact parametric likelihood. The inference functions for margins approach is a two-step approach, which estimates the marginal parameters separately for each margin in a first step and then estimates the copula parameters given the the marginal parameters. The canonical maximum likelihood approach is for copula parameters only, using uniform pseudo-observations obtained from transforming all the margins by their empirical distribution functions.Section 46.5 presents two novel engineering applications. The first example is a bivariate process control problem, where the marginal normality seems appropriate, but joint normality is suspicious. A Clayton copula provides better fit to the data than a normal copula. Through simulation, the upper control limit of Hotelling’s T2 chart based on normality is shown to be misleading when the true copula is a Clayton copula. The second example is a degradation analysis, where all the margins are skewed and heavy-tailed. A multivariate gamma distribution with normal copula fits the data much better than a multivariate normal distribution.Section 46.6 concludes and points to references about other aspects of copula-based multivariate modeling that are not discussed in this chapter.

Jun Yan

Applications in Engineering Statistics

Frontmatter
47. Environmental Risks Analysis Using Satellite Data

In this chapter the methodology of risk assessment and decision making in the field of environmental security is analyzed in view of complex novel threats and challenges connected with development of important multiscale tendencies of global climate and environmental changes, globalization, decentralization, and social transformation. To provide a methodological basis for increasing the effectiveness of environmental security management, a number of tasks were analyzed. A nonparametric two-stage method of multi-source data coupling and spatial–temporal regularization is proposed and discussed. Next, an approach to multiscale local and global model integration based on the modified ensemble transform Kalman filtration procedure is proposed. An approach to a risk assessment based on the nonparametric kernel analysis of coherent complex measures of multidimensional multivariate distributions is then proposed. A decision making approach in the field of environmental risks analysis using satellite data and multi-model data is also considered and discussed. A number of important algorithms is described. Finally, the capabilities, limitations, and perspectives of the proposed methods and algorithms are discussed.

Yuriy V. Kostyuchenko
48. Probabilistic Models for Reliability Analysis Using Safe-Life and Damage Tolerance Methods

The chapter presents a systematical method for probabilistic analysis using safe-life and damage tolerance models. In particular, the reliability analysis incorporating those models are developed to provide a basic framework for the life prediction given risk constraints and the time-dependent probability of failure estimation. First, the probabilistic modeling is presented, and the uncertainties from model prediction and data are considered. The uncertainties are quantified and are encoded in the probability density functions of model parameters using probabilistic parameter estimation. The propagation of the characterized uncertainties to the result of quantity of interest can be obtained using probabilistic prediction. Next, the reliability model based on the probabilistic modeling is introduced, where the safe-life model and the damage tolerance model are discussed in detail. The life prediction given a certain risk constraint and the time-dependent probability of failure estimation can be made using the developed method. Two examples are employed to demonstrate the overall method.

Xuefei Guan, Jingjing He
49. Application of Cognitive Architecture in Multi-Agent Financial Decision Support Systems

The multi-criteria character associated with making decisions entails the need for analysis and evaluation of a large amount of information and drawing conclusions on the basis of such information. Since the process is time-consuming and practically impossible to be performed by a decision-maker in real time, it is necessary to use computer decision support systems, including multi-agent systems. To support financial decision-making, the cognitive technologies can be used. These technologies support using natural language processing to sentiment analysis for the consumer ambience and to buy/sell decision-making. The aim of this chapter is to present an approach to apply cognitive architecture in a multi-agent financial decision support system. On the basis of performed researches, it can be stated that cognitive architecture allows to increase the usability of the multi-agent system and consequently to improve the process of taking investment decisions. The ability to make automatic decisions is also of high importance here. For example, the cognitive agent can make real transactions (open/close short/long position) on the Forex market.

Marcin Hernes, Ngoc Thanh Nguyen
50. Reliability of Electronic Packaging

In the semiconductor industry, the study of reliability and the ability to predict temperature cycling fatigue life of electronic packaging are of significance. For that purpose, researchers and engineers frequently employ the finite element method (FEM) in their analyses. It is primarily a mechanics analysis tool that takes material properties, manufacturing processes, and environmental factors into consideration. Engineers also like to use FEM in their design of electronic package, but frequently the term ``reliability” they refer to only addresses the robustness of a particular design. It has little to do with probability and statistics. Meanwhile, in manufacturing factories of electronic products, including packaging, accelerated life testing (ALT) is carried out very often by quality engineers to find lives of a product in more severe environmental conditions than those of the field condition. Through regression analysis of the test result based on an empirical or semiempirical formula, the acceleration factor (AF) can be obtained for use in predicting service life of the product in field condition. Again, other than regression analysis, little probability and statistics are involved. By taking parameter uncertainties into consideration, this chapter demonstrates by an example that FEM, ALT, and AF can be combined to study the reliability of electronic packaging in which probability and statistics are applied.

Wen-Fang Wu, Yi-An Lu
51. Accelerated Life Testing

Accelerated life test (ALT) is a widely used method during product design with the aim to obtain reliability information on components and subsystems in a timely manner. Different types of ALTs provide different information about product and its failure mechanisms. To ensure that the ALTs can assess the product reliability accurately, quickly, and economically, designing efficient test plans is critically important. This chapter provides a limited discussion and description of the models and methods popularly used in ALT. The introduction describes the background and motivations for using accelerated testing and classifies the reliability tests. Section 51.2 provides the basic concepts and factors, which should be taken into account in planning and conducting ALTs. Sections 51.3 and 51.4 provide brief descriptions of specific applications of statistical models including the life distribution and the life-stress relationship. Section 51.5 illustrates an approach for analyzing ALT data. The graphical and numerical methods are discussed for fitting an ALT model to data and for assessing its fit. Section 51.6 describes the research development of the methods for planning optimal ALT with location-scale distribution. Section 51.7 reviews some of the potential pitfalls of the ALT and gives some suggestions.

Qingchuan He, Wen-Hua Chen, Jun Pan
52. Accelerated Life Testing Data Analyses for One-Shot Devices

One-shot device testing data arises from devices that can be used only once, for example, fire extinguishers, electro-explosive devices, and airbags in cars. In life tests, only the conditions of the tested devices at a specified time can be observed, instead of their actual lifetimes. Such data is therefore of either left- or right-censored. For these heavily censored data, there is an increasing need to develop innovative techniques for reliability analysis. In this chapter, we provide an overview of analyses of one-shot device testing data collected from accelerated life tests and discuss some statistical issues on the statistical estimation and inference as well as optimal designs of accelerated life tests for one-shot devices.

Narayanaswamy Balakrishnan, Man Ho Ling
53. Tangent Space Approximation in Geometric Statistics

The Procrustes regression model provides a statistical framework to assess the errors in image registration (in arbitrary dimensions) from “landmark” data. The same mathematics can be used to determine the errors in calculated motions of rigid bodies in Euclidean space.Perhaps the scientifically most compelling example of rigid body motion is tectonic plates. Tectonic plates, to a first approximation, move as rigid bodies on the surface of the Earth. The estimation of the past configuration of the tectonic plates, and the errors in these reconstructions, is integral to the understanding of the past history of the Earth. Because tectonic plates are restricted to the surface of the Earth, the Procrustes regression model does not apply and the relevant model is called spherical regression.The Procrustes and spherical regression models are mathematically simple and, because of this simplicity, beautiful theorems about the properties of their estimates can be proven. The previous chapter (Chang T, Image registration, rigid bodies, and unknown coordinate systems. In: Pham H (ed) Springer handbook of engineering statistics, Springer-Verlag, London, pp. 571–590, 2006) discusses many of these results. Using the spherical regression model, interesting insights into the properties of tectonic plate reconstructions are discussed in (Chang, J Geophys Res 92(B7):6319–6329, 1987; Chang, Int Stat Rev 61:299–316, 1993). We will not replicate these results here. Rather the focus of this chapter is the underlying mathematics used to establish the results. These problems are intrinsically geometric and the proper use of geometry is important to their understanding. We will explain these points.This chapter is motivated by a request from a geophysics friend to explain, in as elementary fashion as feasible, the mathematics behind his work. This chapter will not replicate the formal proofs that appear elsewhere (see, for example, Chang, Ann Stat 14(3):907–924, 1986, Rivest, Ann Stat 17(1):307–317, 1989, or Chang and Ko, Ann Stat 23(5):1823–1847, 1995). This chapter is aimed at the scientist who wants to understand on a heuristic level why the results are true, without reading mathematically complete proofs!The type of data that is actually used to estimate plate reconstructions (experimentally determined “marine magnetic anomaly lineation” locations) is not of the form that is modeled in the spherical regression model. We discuss in Sect. 53.6 the type of data that is actually collected and how to analyze it using the mathematical principles we will discuss here. For the nongeophysicist, this section can be used as an example of the use of the mathematical principles of this chapter in a different and complex data setting.Furthermore, although the rigid plate hypothesis is a simplification, one must first understand what types of reconstruction errors are consistent with rigid plates before deciding if the errors one is observing are in fact evidence of nonrigidity. Section 53.6 discusses some work of this type.Essentially, this chapter is a case study in the use of tangent space approximations and some elementary ideas from differential geometry. Other authors have used similar geometrical approaches. For example, Rivest (J Biomech 38:1604–1611, 2005) and Oualkacha and Rivest (Biometrika 99:585–598, 2012) developed statistical methods for human motion studies derived from sensors placed on the human body as it moves. Chang and Rivest (Ann Stat 29(3):784–814, 2001) extended to Stiefel manifolds the work outlined here and in (Chang T, Image registration, rigid bodies, and unknown coordinate systems. In: Pham H (ed) Springer handbook of engineering statistics, Springer-Verlag, London, pp. 571–590, 2006) and used the results to reanalyze a data set on vector cardiograms. Patrangenaru has numerous papers (e.g., Mardia and Patrangenaru (Ann Stat 33(4):1666–1699, 2005)) developing statistical methods for comparing images when projective, rather than rigid, transformations are allowed. Indeed, the author believes that the engineering disciplines have multiple problems of a geometric nature and hopes that the approaches used here can be helpful in studying them.This chapter has been written so that it can be read independently of the preceding chapter (Chang T, Image registration, rigid bodies, and unknown coordinate systems. In: Pham H (ed) Springer handbook of engineering statistics, Springer-Verlag, London, pp. 571–590, 2006).

Ted Chang
54. Statistical Modeling of Discrete Choices for Human Behaviors

Human behavior models, especially the choice models, have been developed and widely used in both research and practice for decades. They help explain human behaviors by modeling the choices made by each individual. Successful applications of these models require good understandings of the properties and the assumptions of the models. This chapter summarizes some commonly used discrete choice models in engineering area. Different assumptions toward the mechanism of choice-making behaviors lead to different types of choice models. We start from the random utility maximization (RUM) theory and present its basic usage in binary and multinomial choice scenarios. We also present the random regret minimization (RRM) theory and relative advantage maximization (RAM) theory. Some extensions of the RUM are also presented, including the nested logit model that allows dependencies among alternatives and mixed logit model that considers the individual heterogeneity. We illustrate the usage and interpretation of these models through a numeric case study.

Xi Zhu, Shuai Huang
55. Weighted Voting Systems

Voting is an ancient method for a group such as a meeting or an electorate to make a collective decision or express an opinion after some discussions, deliberations, or election campaigns. Participants who give an option or choose a candidate are called voters; therefore, a simplest voting system consists certain number of qualified voters and candidates. The most common application in weighted voting systems is, for example, the US Electoral College, where the number of electoral votes for each state is based upon its population. This chapter states the modeling of threshold weighted voting systems and the dynamic analysis of two terms including indecisive effect and known/unknown inputs. The system reliability models presented in the chapter are based on the assumptions that the system operates as two types of input values (0, 1) and three types of output values (0, 1, x) with three types of errors, and the components are unequally weighted and subject to three failure-modes (stuck-at-0, stuck-at-1, stuck-at-x). For any weighted voting system, a decision rule is required although different rules may result different system performance in terms of the system reliability. For instance, the current decision rule of US Electoral College is who wins the election obtaining at least 270 electoral votes, which results four former US presidents won the elections with less national populate votes than their opponents in history.In general, the weighted voting system (WVS) consists of n units assigned with individual weights, each of which provides a binary decision (0 or 1) or abstains (x) from voting. A generic decision rule can be defined as the system output is 1 if the cumulative weight of all 1-opting units is at least a prespecified threshold τ of the cumulative weight of all nonabstaining units. If the indecisive effect is considered, weights of abstaining units can be added in the decision rule such as the system output is 1 if the cumulative weight of all 1-opting units is at least a prespecified threshold τ of the sum of all nonabstaining units and prespecified indecisive parameter θ of all abstaining units. The system fails if the generated output is not equal to its original input. Recent research results indicate that, under specified assumptions, multiple approaches can be used to quantify the reliability of the weighted voting system. This chapter demonstrates the development of decision rules and the evolution of approaches of generating reliability function. Some related works are addressed to provide a full picture of WVS, and some future works are proposed to attract attentions.

Hainan Zhang
56. Image Registration, Rigid Bodies, and Unknown Coordinate Systems

This chapter deals with statistical problems involving image registration from landmark data, either in Euclidean 2-space R2 or 3-space R3. In this problem, we have two images of the same object (such as satellite images taken at different times) or an image of a prototypical object and an actual object. It is desired to find the rotation, translation, and possibly scale change, which will best align the two images. Whereas many problems of this type are two-dimensional, it should be noted that medical imaging is often three dimensional.After discussing several estimation techniques and their calculation, we discuss the relative efficiency of the various estimators. These results are important in choosing an optimal estimator. The relationship of the geometry of the landmarks to the statistical properties of the estimators is discussed. Finally we discuss diagnostics to determine which landmarks are most influential on the estimated registration. If the registration is unsatisfactory, these diagnostics can be used to determine which data points are most responsible and should be reexamined.

Ted Chang
Backmatter
Metadaten
Titel
Springer Handbook of Engineering Statistics
herausgegeben von
Hoang Pham
Copyright-Jahr
2023
Verlag
Springer London
Electronic ISBN
978-1-4471-7503-2
Print ISBN
978-1-4471-7502-5
DOI
https://doi.org/10.1007/978-1-4471-7503-2

    Marktübersichten

    Die im Laufe eines Jahres in der „adhäsion“ veröffentlichten Marktübersichten helfen Anwendern verschiedenster Branchen, sich einen gezielten Überblick über Lieferantenangebote zu verschaffen.