nach oben

Wood Science and Technology

Erschienen in:

Open Access 01.08.2011 | Original

Improved estimation of the lower percentiles of material properties

verfasst von: David J. Edwards, Frank M. Guess, Timothy M. Young

Erschienen in: Wood Science and Technology | Ausgabe 3/2011

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Patentsuche

Aus

Abstract

Estimating lower percentiles in reliability for medium-density fiberboard is an important issue for manufacturers for better assessing and improving manufacturing processes, plus for guiding better product warranties while seeking lower costs. Since data may be sparse or costly in the lower tails, estimation of these percentiles may be difficult. Bootstrapping provides a helpful solution for interval estimation of lower percentiles when other approaches fail or are not as realistic. This computer intensive resampling technique estimates more accurately the true standard error of any population parameter, not just percentiles. Bootstrapping can be used for parametric models or indeed nonparametric settings when parametric models are not appropriate. This paper shows the usefulness of bootstrap methods to better assess the key quality metric of internal bond (IB or tensile strength) of medium-density fiberboard (MDF) in the critical lower percentiles when data are limited.

Introduction

There are many ways to measure reliability of a component subsystem or system product being manufactured. Compare the classic reliability references of Barlow and Proschan (1975, 1981) plus the more recent Kuo et al. (1998, 2000), and Meeker and Escobar (1998, 2004). Often, some key reliability measures are the mean or median time to failure (Hoffmeyer and Sorensen 2007). Kim and Kuo (2003) stress the importance of percentiles in optimizing system life in contrast to other classical approaches, see also Prasad et al. (2001). These lower percentiles may be of critical importance to manufacturers of engineered wood panels as such percentiles may represent product failure (Steiger and Arnold 2009).

In this article, bootstrapping methods as a useful approach to understanding reliability of manufactured medium-density fiberboard (MDF) are discussed. This study is an outcome of Edwards (2004) and builds upon the study of wood plastics composites discussed in Young et al. (2008). Bootstrapping’s versatility allows this approach to be used on a wide range of engineering and manufacturing settings where standard approaches might yield misleading numbers.

In numerous reliability studies, it is of particular interest to estimate percentiles. In particular, interest usually lies in the estimation of the lower percentiles. These lower numbers are helpful for warranty analysis, understanding early failures during normal usage, improving the specification limits, reducing manufacturing costs, and avoiding costly product failure claims.

In this study, the authors focus on the needs of estimating percentiles of internal bond (IB) strengths of MDF measured in kilopascal (kPa), but the estimation procedure applies much more generally to various manufacturing settings, lifetimes, service response times, repair times, or any kind of response time (time to assemble a product, etc.) for improving reliability by more realistic assessment of uncertainty.

To be able to say that improvements have been made, one must be able to measure reliability expressed in percentiles that allow for statistical uncertainty inherent in real data. Knowing when to trust confidence intervals and when not to trust them are crucial for engineers and technical managers (Moses et al. 2003).

Historically, the problem of estimating percentiles was not in finding point estimators, but in finding standard errors and thus confidence intervals of percentiles. Serfling (1980) thoroughly and superbly examines the asymptotic distribution of the sample quantile. In particular, under mild requirements (i.e., smoothness of the distribution function), the sample quantiles are asymptotically normal. This is a useful result since by possessing asymptotic normality, asymptotic normal confidence intervals for the pth quantile can be constructed. Meeker and Escobar (1998) discuss the construction of such intervals for the location-scale distributions used commonly in reliability data analysis (i.e., normal, lognormal, Weibull). In particular, an asymptotic normal confidence interval for t _p is given by:

$$ \hat{t}_{p} \pm z_{1 - \alpha /2} \hat{s}_{{\hat{t}_{p} }} $$

(1)

where $ \hat{t}_{p} $ is the estimated pth quantile, and $ \hat{s}_{{\hat{t}_{p} }} $ is the standard error of the estimate approximated by:

$$ \hat{s}_{{\hat{t}_{p} }} = \hat{t}_{p} \left\{ {\text{Var} \left( {\hat{\mu }} \right) + 2\Upphi^{ - 1} \left( p \right){\text{Cov}}\left( {\hat{\mu },\hat{\sigma }} \right) + \left[ {\Upphi^{ - 1} \left( p \right)} \right]^{2} \text{Var} \left( {\hat{\sigma }} \right)} \right\}^{1/2} .$$

(2)

Equation (2) is obtained using the delta method; $ \hat{\mu } $ and $ \hat{\sigma } $ are the maximum likelihood estimates (MLEs) of the location and scale parameters, respectively, and $ \Upphi^{ - 1} $ represents the inverse of the cumulative standardized location-scale distribution of interest. Var($ \hat{\mu } $), Var($ \hat{\sigma } $), and Cov($ \hat{\mu },\hat{\sigma } $) are obtained from the inverse of the observed information matrix.

When the sample size is sufficiently large, the asymptotic normal intervals can provide reasonable approximations. Even though these intervals are approximations, they are usually good enough for practice, provided the sample size is indeed large enough. However, data may not be plentiful, and in many manufacturing settings, parametric assumptions may be suspect or actually invalid, leading to a higher risk of inaccurate results. Asymptotic intervals are often criticized for not being as realistic for small or even moderate sample sizes. Bootstrapping provides an alternative strategy that can realistically inform the practitioner by a more accurate assessment of the variability inherent in a system or process.

Methods

MDF manufacturer dataset

The IB data are from a MDF manufacturer in North America and are sorted based on three key characteristics: density (kg/m³), thickness (mm), and width (mm). These three characteristics differentiate the MDF produced by the manufacturer for various applications. Since MDF in this particular study was produced in continuous length of sheets, length was not a crucial variable for the purposes here as indicated by the manufacturer. For the purpose of analysis, the MDF was separated into two main groups: Group I- standard density and Group II- high density. The high density type is MDF with densities of 753–769 kg/m³. The standard density type is MDF with densities of 721–737 kg/m³.

Since there were a number of MDF product types within each group produced by the manufacturer, two types were selected for a more detailed analysis: in particular, Type 1 (737 kg/m³, 15.9 mm thick, 1,550 mm wide) from Group I and Type 5 (769 kg/m³, 15.9 mm thick, 1,550 mm wide) from Group 2. These two MDF types were chosen since they are commonly used MDF product types and in order to allow for useful comparisons. Type 1, which had the most sales of the producer, had n = 396 observations while Type 5, a higher valued product, had n = 74 observations. This illustrates two extremes in the data.

Bootstrap methods and confidence intervals

The fundamental idea behind the bootstrap is that the empirical bootstrap distribution provides an approximation to the theoretical sampling distribution of the statistic of interest. Meeker and Escobar (1998) contend that bootstrap methods, “when used properly, can be expected to be more accurate than the normal-approximation methods and competitive with the likelihood-based methods.” Bootstrapping is a computer intensive statistical method where the basic idea is to simulate the sampling process a specified (usually large) number of times and obtain an approximate sampling distribution of interest. This empirical bootstrap distribution is then used to acquire characteristics (i.e., standard error, bias estimates, confidence intervals) with regard to the population parameter; see Chernick (1999) which is an excellent book on many bootstrap methods and their applications. Efron and Tibshirani (1993) provide an excellent introduction to the fundamental concepts and applications of bootstrapping. Also, DiCiccio and Efron (1996) are devoted to the construction of bootstrap confidence intervals.

Bootstrap sampling methods

This study begins with the fully nonparametric bootstrap and adopts the notation of Martinez and Martinez (2002). In general, the basic nonparametric bootstrap procedure can be summarized as follows. For a given data set, x = $ (x_{1} ,x_{2} , \ldots ,x_{n} ) $ of size n, a population parameter is estimated nonparametrically, say θ, by $ \hat{\theta }. $ For instance, the pth quantile is estimated as the (p/100)(n + 1)st observation in x. It is then sampled with replacement (i.e., a unit is drawn from and then returned to the sample allowing for the possibility of being drawn again, repeating this process many times using simulation) from the original data set to obtain a bootstrap sample of the same size n as the original data denoted by x ^*b $ = (x_{1}^{*b} ,x_{2}^{*b} , \ldots ,x_{n}^{*b} ). $ This resampling with replacement is usually done a large number of times, B. For each bootstrap sample, a new estimate of θ is calculated, denoted by $ \hat{\theta }^{*b} $ where b stands for the bth bootstrap estimate. The empirical bootstrap distribution of $ \hat{\theta }^{*} , $ is defined and used as an estimate to the true sampling distribution of $ \hat{\theta }. $ This method of sampling is helpful since it has the advantage of no distributional assumptions.

The completely parametric bootstrap, which requires the assumption of a parametric distribution, is described briefly in Efron and Tibshirani (1993), Meeker and Escobar (1998), and Chernick (1999). Meeker and Escobar (1998) point out that the parametric bootstrap has a disadvantage in reliability data problems. That is, the complete censoring process must be specified given that data from an assumed parametric distribution are simulated. This may seem to be unproblematic in simple examples where such specification is easy. For example, the strength data is complete. However, this can be more difficult for complicated systematic or random censoring. Thus, the fully parametric form of sampling is not emphasized in this paper.

As an alternative method, Meeker and Escobar (1998) describe and illustrate applications of a “nonparametric” bootstrap sampling method for parametric inference, which is denoted, for the sake of simplicity, as NBSP, for nonparametric bootstrap sampling for parametric models. This sampling scheme does require parametric assumptions. However, rather than simulating random variates from an assumed parametric distribution, the authors sample with replacement from the original data. For each bootstrap sample of size n, MLEs are obtained based on the assumed parametric model. These MLEs are used to estimate the population parameter of interest and form the bootstrap distribution. For instance, a parametric estimate of the pth percentile is given by $ \hat{t}_{p} = \exp [\hat{\mu } + \Upphi^{ - 1} (p)\hat{\sigma }] $, which requires the MLEs $ \hat{\mu }{\text{ and }}\hat{\sigma } $.

Bootstrap confidence intervals

Different algorithms/methods are available for constructing bootstrap confidence intervals for population parameters. The authors emphasize the standard normal bootstrap confidence interval, bootstrap percentile interval, and bias-corrected bootstrap percentile interval. Much of the theoretical details are omitted. For those interested in the theoretical underpinnings and additional topics see, among others, Efron and Tibshirani (1993), DiCiccio and Efron (1996), and Davison and Hinkley (1997). The standard bootstrap confidence interval is given by:

$$ [\hat{\theta } - z^{(\alpha /2)} s_{{\hat{\theta }}} { , }\hat{\theta } + z^{(1 - \alpha /2)} s_{{\hat{\theta }}} ] $$

(2.1)

where $ \hat{s}_{{\hat{\theta }}} $ is obtained by computing the standard deviation of the B bootstrap estimates of θ and z ^(α/2) is the $ \alpha /2{\text{th}} $ quantile of the standard normal distribution. The necessary steps are provided in Algorithm 1 below. The algorithms that follow are given for the fully nonparametric case with the NBSP method alternatives shown in parentheses.

Algorithm 1: standard bootstrap confidence interval:

Step 1. From the original sample of size n, estimate the parameter(s) of interest (denoted by $ \hat{\theta } $). (For the NBSP method, obtain MLEs of the assumed parametric distribution and use them to estimate the parameter(s) of interest.)
Step 2. Sample with replacement from the original sample to create a bootstrap sample of size n.
Step 3. Estimate the parameter(s) of interest from the bootstrap sample to obtain $ \hat{\theta }^{*b} $. (For the NBSP method, calculate the MLE’s of the assumed parametric distribution based on the bootstrap sample and use them to estimate the parameter(s).)
Step 4. Repeat steps 2 and 3 a pre-specified B ≥ 1,000 times to form the bootstrap distribution.
Step 5. Calculate the standard deviation of the B bootstrap estimates ($ \hat{s}_{{\hat{\theta }}} $) and use this to estimate the standard error, $ s_{{\hat{\theta }}} $.
Step 6. Use (2.1) to obtain the confidence interval.

Perhaps one of the most obvious ways to construct a confidence interval is to base it on the quantiles of the bootstrap distribution of estimates, which is known as the percentile method.

Algorithm 2: bootstrap percentile confidence interval:

Steps 1, 2, 3, and 4. Same as in Algorithm 1.
Step 5. Order the B bootstrap estimates, $ \hat{\theta }^{*b} $.
Step 6. Determine the $ \alpha /2{\text{th}} $ and $ 1 - (\alpha /2){\text{th}} $ quantiles of the distribution of $ \hat{\theta }^{*} $ denoted by $ \hat{\theta }^{*(\alpha /2)} {\text{ and }}\hat{\theta }^{*(1 - \alpha /2)} $, respectively.
Step 7. Form the 1 − α confidence interval as $ [\hat{\theta }^{*(\alpha /2)} { , }\hat{\theta }^{*(1 - \alpha /2)} ] $.

Though the percentile method is easy to implement, Chernick (1999) points out that the percentile method works well if exactly 50% of the bootstrap distribution is less than $ \hat{\theta } $ which certainly might not hold and “in the case of small samples, the percentile method does not work well.” Fortunately, there are methods that help improve on the percentile method.

The bias-corrected percentile interval (or BC) was introduced in Efron (1981) and discussed further in Efron (1987). A bias-correction constant is defined as the amount of difference between the median of the bootstrap estimates $ \hat{\theta }^{*b} $ and the estimate, $ \hat{\theta } $, from the original sample. Explicitly, the estimate of the bias-correction constant, denoted by $ \hat{z}_{0} $, is defined as:

$$ \hat{z}_{0} = \Upphi_{\text{NOR}}^{ - 1} \left( {{\frac{{\# (\hat{\theta }^{*b} < \hat{\theta })}}{B}}} \right) $$

(2.2)

where $ \Upphi_{\text{NOR}}^{ - 1} $ represents the inverse cumulative standard normal distribution and # means “number of”. Then, a 100(1 − α)% BC confidence interval for θ is given by:

$$ [\hat{\theta }^{{*(\alpha_{1} )}} , \hat{\theta }^{{ * (\alpha_{ 2} )}} ] $$

(2.3)

where $ \alpha_{1} {\text{ and }}\alpha_{2} $ are the new quantities on which to base the percentile confidence interval endpoints. These quantities are defined as:

$$ \alpha_{1} = \Upphi_{\text{NOR}} (2\hat{z}_{0} + z^{(\alpha /2)} ) $$

(2.4)

and

$$ \alpha_{2} = \Upphi_{\text{NOR}} (2\hat{z}_{0} + z^{(1 - \alpha /2)} ) $$

(2.5)

where $ \Upphi_{\text{NOR}} $ is the cumulative standard normal distribution.

Algorithm 3: bias-corrected percentile bootstrap confidence interval:

Steps 1, 2, 3, and 4. Same as Algorithm 1.
Step 5. Calculate the bias-correction constant, $ \hat{z}_{0} $, as given in (2.2).
Step 6. Determine the new cutoff percentages, $ \alpha_{1} {\text{ and }}\alpha_{2} $, as given in (2.4) and (2.5).
Step 7. Order the bootstrap estimates, $ \hat{\theta }^{*b} $.
Step 8. Determine the $ \alpha_{1} {\text{th}} $ and $ \alpha_{2} {\text{th}} $ quantiles of the distribution of $ \hat{\theta }^{*} $ denoted by $ \hat{\theta }^{{*(\alpha_{1} )}} {\text{ and }}\hat{\theta }^{{*(\alpha_{2} )}} $ respectively.
Step 9. Form the 1 − α confidence interval as given in (2.3).

Results and discussion

For each method of sampling, the standard normal, percentile, and bias-corrected percentile bootstrap intervals were constructed and compared for the 1st, 10th, 25th, and 50th (median) percentiles for MDF product Types 1 and 5. These two types were chosen to aid in the illustration of the benefits and limitations of the bootstrap. Recall their respective sample sizes given above. For each method of sampling, B = 2,000 bootstrap samples of the same size as the original sample were created. In many cases, but not always, this should be a sufficient number of bootstrap samples to create the confidence intervals. The asymptotic normal confidence intervals will also be provided in order to compare with the bootstrap results.

Table 1 provides the 95% asymptotic normal confidence intervals for Type 1 MDF, while Table 2 shows the fully nonparametric 95% bootstrap confidence intervals. In the tables that follow, LCL stands for lower confidence limit and UCL stands for upper confidence limit. Figure 1 displays the nonparametric empirical bootstrap sampling distribution for each of the four quantiles. An initial look at the bootstrap sampling distributions shown in Fig. 1 indicates that the bootstrap distribution becomes narrower and more peaked as the percentiles increase from 1 to 50, reflecting smaller variability in the sampling distribution.

Table 1

95% Asymptotic normal confidence intervals for IB strength of Type 1 MDF

p	$ \hat{t}_{p} $ = quantile (kPa)	LCL	UCL
.01	670.7	657.8	683.6
.10	741.8	732.8	750.9
.25	783.1	775.7	790.6
.50	829.1	822.4	835.8