2.3 Probability distributions of input variables
In a probabilistic model, we can specify the input data as a probability distribution (continuous or discrete). So, from now on, we will assume that
x1,
x2, … are not fixed numbers, but that they are stochastic (random) numbers, following some probability distribution. We will use the convention from probability theory to indicate stochastic variables with capital letters, like
X1,
X2, … Further, the symbol ~ indicates that a stochastic variable is distributed according to some probability distribution. For instance,
$$ \left\{\begin{array}{l}{X}_1\sim N\left({\mu}_{X_1},{\sigma}_{X_1}\right)\\ {}{X}_2\sim N\left({\mu}_{X_2},{\sigma}_{X_2}\right)\\ {}\cdots \sim \cdots \end{array}\right. $$
where
N(
μ,
σ) is the normal (Gaussian) probability distribution with parameters
μ and
σ. We might go for other probability distributions (uniform, log-normal, binomial, etc.) but at this stage want to keep the discussion simple. The numbers that specify the numerical details of the probability distribution (here
μ and
σ in general, and more specifically
\( {\mu}_{X_1} \),
\( {\mu}_{X_2} \),
\( {\sigma}_{X_1} \),
\( {\sigma}_{X_2} \), etc.) are referred to as parameters. So, not
x1 is a parameter (as the usual terminology in LCA goes), but rather
\( {\mu}_{X_1} \) and
\( {\sigma}_{X_1} \) are parameters of the distribution of
X1. Other types of distributions are usually specified with different types of parameters (for instance, the uniform distribution with a parameter for the lower limit and a parameter the upper limit) or even with another number of parameters (for instance, the Poisson distribution requires only one parameter, while the asymmetric triangular distribution requires three parameters).
2.4 Probability distributions of output variables
Recognizing that (some of) the input parameters of the model
f(·) are stochastic, a logical consequence is that the model output is also stochastic. Thus, we write
$$ Y=f\left({X}_1,{X}_2,\dots \right) $$
See Heijungs et al. (
2019). With this change of
y into
Y, our task shifts from calculating the value of
y to calculating the distribution of
Y. More specifically, we may want to know:
The shape of the distribution of Y (i.e., normal, uniform, log-normal, binomial, etc.)
The value or values of the parameter or parameters (e.g., μY and σY)
Probability theory offers methods to calculate the probability distribution of
Y when those of
X1,
X2, … are given, but only for a few cases of
f(·) and only for a few input distributions. For instance, when
Y =
f(
X1,
X2) =
X1 +
X2 and
X1 and
X2 are normal, every textbook shows that
$$ Y\sim N\left({\mu}_{X_1}+{\mu}_{X_2},\sqrt{\sigma_{X_1}^2+{\sigma}_{X_2}^2}\right) $$
In words, the sum of two normal variables is itself normally distributed, and the parameters
μY and
σY can easily be calculated from the parameters of the input distributions. Another case is
\( Y=f\left({X}_1,{X}_2\right)={X}_1^2+{X}_2^2 \). This is pretty complicated, but when we take the special case of
\( {\mu}_{X_1}={\mu}_{X_2}=0 \) and
\( {\sigma}_{X_1}={\sigma}_{X_2}=1 \), it is a well-known result:
where
χ2(
ν) is the chi-squared distribution with parameter
ν. In general, most choices of
f(·) with less trivial combinations of
X1,
X2, … (such as
\( f\left({X}_1,{X}_2\right)={X}_1{X}_2^2+\frac{\ln {X}_1}{4+\sin {X}_2} \)) are not manageable by the theory of probability. It is therefore important to have an alternative way to determine the probability function of such more complicated functions of stochastic variables. The same applies also to situations where
f(·) is straightforward, but where the input distributions for
X1,
X2, etc. are not normal.
The Monte Carlo approach (Metropolis and Ulam
1949; Shonkwiler and Mendivil
2009) can be used as an alternative way for constructing the probability distribution of
Y in case the mathematical approach is too hard. It is based on artificially sampling values from
Y, and using this sample for reconstructing (the technical term is estimating) the shape and the parameter values of
Y. We will spend the next section on the topic of estimating a probability distribution from a sample of values. This is a topic of more general interest than Monte Carlo simulations, so we will keep the discussion quite general, also covering the case of estimating the distribution of input variables like
X1 and
X2.
2.5 Estimating a probability distribution in general
We will discuss the question of estimating a probability distribution
Z (including its parameters), given a sample of data,
z1,
z2, …,
zn. This task is known as the estimation problem, and it is one of the central topics of inferential statistics. See, for instance, Rice (
2007) and Casella and Berger (
2002) for general textbooks.
Suppose we have a sample of data from an unknown stochastic process,
Z. Let the sampled values be indicated by
zi, for
i = 1, …,
n. If we want to estimate the probability distribution belonging to the stochastic process that generated this sample, we must first make an assumption about the type of distribution. Is it a normal distribution, a uniform distribution, a log-normal distribution, a Weibull distribution? This choice is one of the trickiest parts of the entire estimation process, because there is no clear guidance. Different aspects can play a role here:
Evidence: the data (e.g., a histogram or a boxplot) may suggest a certain distribution.
Conventions and compatibility with software: the log-normal distribution has a longer and more widespread history in LCA than the Erlang distribution.
Familiarity and simplicity: if the histogram looks approximately bell-shaped, a normal distribution is more natural than the Cauchy distribution.
Statistical criteria: we can use statistical tests (such as those by Kolmogorov-Smirnov and Anderson-Darling) to assess the goodness-of-fit with a number of probability distributions.
Clearly, there are also cases where none of the conventional model distributions provides a satisfactory fit with the empirical data. We will not further discuss such cases, because the usual procedure in LCA is to model input uncertainties in terms of just a few distributions: lognormal, normal, uniform, or triangular (Frischknecht et al.
2004) or perhaps a few more (gamma and beta PERT; see Muller et al.
2016).
Once we have selected a probability distribution, the next task is to estimate the parameter value or values of that distribution. Suppose we have selected a normal distribution, so
$$ Z\sim N\left({\mu}_Z,{\sigma}_Z\right) $$
where
μZ and
σZ are the distribution’s parameters, which are still unknown at this stage of the analysis. Then, our task is to estimate the values of
μZ and
σZ that correspond best with the sampled data. Different estimation principles are available in the statistical literature to do this. Two widely used principles are the method of moments and the method of maximum likelihood. For the case of a normal distribution, these two principles yield the same estimate of
μZ and
σZ, but for some distributions, there is a difference in the outcome of the estimation procedure. Anyhow, the theory of statistics offers formulas for estimators, which are functions of the observations. We can use the symbol of the parameter to be estimated with a hat on top of it to indicate the estimator:
\( \hat{\mu} \) is an estimator of
μ and
\( \hat{\sigma} \) is an estimator of
σ. In the case of a normal distributions, both estimation principles (method of moments and method of maximum likelihood) suggest
$$ {\hat{\mu}}_Z=\frac{1}{n}\sum \limits_{i=1}^n{Z}_i $$
and
$$ {\hat{\sigma}}_Z=\sqrt{\frac{1}{n}\sum \limits_{i=1}^n{\left({Z}_i-{\hat{\mu}}_Z\right)}^2} $$
as estimators for
μZ and
σZ. When applied to a concrete data set,
z1,
z2, …,
zn, these estimators produce a concrete value, because we insert the observed values of
zi at the place of the stochastic variable
Zi. These concrete values are the estimates, which we will indicate hereafter as
\( \overline{z} \) and
sZ.
Of course, we cannot expect that the estimates will be fully accurate if the sample size is finite. The estimate \( \overline{z} \) will be hopefully close to the true value μZ, but probably it will be a little bit off (that is also why we distinguish the symbols: in general \( \overline{z}\ne {\mu}_{\mathrm{Z}} \), but \( \overline{z}\approx {\mu}_{\mathrm{Z}} \)). The same applies to the estimate sZ of σZ.
The theory of inferential statistics not only allows to estimate the values, but it also allows us to say something about the level of precision of such estimates. This is done through the theory of sampling distributions, standard errors, and confidence intervals.
A sampling distribution is the probability distribution of an estimator. Let us suppose we have a probability distribution Z ∼ N(μZ, σZ), with unknown parameter μZ and known parameter σZ, from which we sample n observations, and use the estimator \( {\hat{\mu}}_{\mathrm{Z}} \) to estimate μZ by the value \( \overline{z} \). If we would take another sample of size n, we can use the same estimator to again estimate μZ, but we will find a slightly different value \( \overline{z} \), because the sample will contain different values. Repeating and repeating, always with the same sample size n, we will end up with a distribution of \( \overline{z} \) values. This distribution will be referred to as \( \overline{Z} \).
The famous central limit theorem states that the distribution of the estimates of the mean,
\( \overline{Z} \), is normally distributed and that there is a simple relation between its parameters (
\( {\mu}_{\overline{Z}} \) and
\( {\sigma}_{\overline{Z}} \)) and the parameters of the parent distribution
Z (
μZ and
σZ):
$$ \overline{Z}\sim N\left({\mu}_{\mathrm{Z}},\frac{\sigma_{\mathrm{Z}}}{\sqrt{n}}\right) $$
So, \( {\mu}_{\overline{\mathrm{Z}}}={\mu}_{\mathrm{Z}} \) and \( {\sigma}_{\overline{\mathrm{Z}}}=\frac{\sigma_{\mathrm{Z}}}{\sqrt{n}} \). This first fact signifies that the mean of the sample means corresponds to the mean of the parent distributions. This is a convenient property, because it allows to use the sample mean (\( \overline{z} \)) as the best guess of μZ. The second fact tells us that the width of the distribution of \( \overline{Z} \) (so \( {\sigma}_{\overline{\mathrm{Z}}} \)) depends on the width of the distribution of Z (so on σZ) and on the size of the sample (so on n). In fact, \( {\sigma}_{\overline{\mathrm{Z}}} \) decreases without limits when n increases. The important consequence is that the estimate of μZ, \( \overline{z} \), is more precise when n is large and that we can determine its value as precisely as we want by just increasing sample size. The larger the sample, the more precise the estimate.
The quantity
\( {\sigma}_{\overline{\mathrm{Z}}}=\frac{\sigma_{\mathrm{Z}}}{\sqrt{n}} \) is known as the standard error of the mean, also known as “the” standard error. For a precise estimation of
μZ, we want this
\( {\sigma}_{\overline{\mathrm{Z}}} \) to be small. The only way to do so is to use a large sample size
n, because
σZ is fixed. The standard error is related to the concept of a confidence interval. For the case of estimating
μZ, the 95% confidence interval is given by
$$ C{I}_{\mu_{\mathrm{Z}};0.95}=\left[\overline{z}-1.96{\sigma}_{\overline{\mathrm{Z}}},\overline{z}+1.96{\sigma}_{\overline{\mathrm{Z}}}\right] $$
This means that with 95% confidence, the interval CI will contain the true value μZ that we are supposed to estimate by \( \overline{z} \). Observe that the confidence interval has a width of \( 2\times 1.96{\sigma}_{\overline{Z}}=3.92{\sigma}_{\overline{Z}}=3.92\frac{\sigma_Z}{\sqrt{n}} \). If we want this interval to be smaller, we need to increase sample size n.
Above, we discussed how to estimate the parameter μ when the parameter σ is known. Estimation of σ and other parameters, and estimation of μ when σ is unlnown, are technically more difficult, but conceptually the idea is the same.
2.6 Estimating the probability distribution of input variables
When we want to estimate the probability distribution of an input variable (
X1, etc.), we carry out the following steps:
We sample data (x11, x12, …, x1n) from the phenomenon (e.g., unit process).
We choose a convenient probability distribution shape (e.g., normal).
We use the formulas for the estimators (\( {\hat{\mu}}_{X_1} \), \( {\hat{\sigma}}_{X_1} \), etc.) to find estimates (\( \overline{x_1} \), \( {s}_{X_1} \), etc.).
The estimated parameter values (\( \overline{x_1} \), \( {s}_{X_1} \), etc.) are “best guesses” given the available data. However, we cannot expect that they are perfect estimates, because the confidence interval of these parameters decreases with \( \frac{1}{\sqrt{n}} \), and n is usually limited. Of course, we can increase n by collecting more primary data, but site visits and measurements are usually expensive and time-consuming. For that reason, in LCA, as in most other fields of science, n is usually quite limited. The price we pay for that is a larger standard error and a wider confidence interval.
2.7 Estimating the probability distribution of output variables, given perfectly known inputs
Next, we move to the topic of estimating the probability distribution of an output variable (
Y, etc.). Suppose, for simplicity, we have one stochastic input variable,
X, normally distributed, with known parameters:
$$ X\sim N\left({\mu}_X,{\sigma}_X\right) $$
Next, we define a very simple function of that variable:
Of course, the distribution of the output variable
Y is trivial:
$$ Y\sim N\left({\mu}_X,{\sigma}_X\right) $$
and in particular,
μY =
μX. But, let us pretend we are bad in probability theory and prefer to use a Monte Carlo approach. We simulate
Nrun instances of
X (namely
\( {x}_1,{x}_2,\dots, {x}_{N_{\mathrm{run}}} \)) and use that to calculate
Nrun instances of
Y (namely
y1 =
x1,
y2 =
x2, etc.). These values of
y are used to estimate
μY as follows:
$$ \overline{y}=\frac{1}{N_{\mathrm{run}}}\sum \limits_{i=1}^{N_{\mathrm{run}}}{y}_{\mathrm{i}} $$
When the sample has been obtained in a random way, we can also be sure that the estimate will converge to the correct value:
$$ \underset{N_{\mathrm{run}}\to \infty }{\lim}\overline{y}={\mu}_{\mathrm{Y}}={\mu}_{\mathrm{X}} $$
Likewise, we can estimate the standard deviation of
Y,
σY. This can be used to find the standard error of the mean
$$ {s}_{\overline{\mathrm{Y}}}=\frac{s_{\mathrm{Y}}}{\sqrt{N_{\mathrm{run}}}} $$
The noteworthy aspect of this standard error is that it will go to zero when
Nrun grows very large:
$$ \underset{N_{\mathrm{run}}\to \infty }{\lim }{s}_{\overline{\mathrm{Y}}}=0 $$
As a consequence, the estimate of
μY will become arbitrarily precise, if we have enough computer time:
$$ \underset{N_{\mathrm{run}}\to \infty }{\lim }C{I}_{\mu_{\mathrm{Y}};0.95}=\left[{\mu}_{\mathrm{Y}},{\mu}_{\mathrm{Y}}\right]=\left[{\mu}_{\mathrm{X}},{\mu}_{\mathrm{X}}\right] $$
That is not surprising. If we would have been more thoughtful, we could have saved the computer expenses and directly deduce that μY = μX, with infinite precision. The situation is comparable to computing \( \frac{1}{2}+\frac{1}{4}+\frac{1}{8}+\frac{1}{16}+\dots \), for a large number of terms, or being more thoughtful and directly writing this as \( \frac{\frac{1}{2}}{1-\frac{1}{2}}=1 \). Both approaches yield approximately the same result. So, when we want to use a Monte Carlo approach to estimate the parameters of a probability distribution, we must use a large sample size Nrun to find a reliable answer. The recommendations quoted in the introduction (1000, 10,000, 100,000) are based on the situation described here: accurately estimating an output distribution on the basis of perfect knowledge of the input distributions.
2.8 Estimating the probability distribution of output variables, given imperfectly known inputs
But now, take the next case, a normal distribution with parameters
μX and
σX, but under the provision that
μX itself is slightly off, because we did not know
μX but used its imperfect estimate
\( \overline{x} \). So, we consider
$$ X\sim N\left(\overline{x},{\sigma}_{\mathrm{X}}\right) $$
Next, we again study the trivial function
first analytically, using probability theory, and then through a Monte Carlo simulation.
Analytically, we find
$$ Y\sim N\left(\overline{x},{\sigma}_{\mathrm{X}}\right) $$
The essential point to observe is that the mean of Y is not μX but \( \overline{x} \), which is likely to be somewhat wrong.
Next, let us try this by a Monte Carlo simulation. We use
\( \overline{y} \) to estimate
μY. It will be close to
\( \overline{x} \), rather than close to
μX. Moreover, the standard error of this estimate is still
\( {s}_{\overline{Y}}=\frac{s_{\mathrm{Y}}}{\sqrt{N_{\mathrm{run}}}} \), so as close to 0 as we like. In fact,
$$ \underset{N_{\mathrm{run}}\to \infty }{\lim }C{I}_{\mu_{\mathrm{Y}};0.95}=\left[\overline{x},\overline{x}\right] $$
Summarizing, using probability theory and using the Monte Carlo approach, both will give you the wrong value (\( \overline{x} \) instead of μX) when estimating μY, and the Monte Carlo approach will in addition suggest that this estimate is very precise due to a vanishing standard error, at least when Nrun.is very large.
Observe that this is not a mistake or limitation of the Monte Carlo approach. In fact, it performs very well. The mistake is entirely due to the analyst, who uses an imperfectly estimated input parameter (
\( \overline{x} \) instead of
μX) to run an infinite-precision method. Also, observe that this is a very ubiquitous situation in LCA: Most LCA data on unit processes is obtained from limited samples. Even a sample size of 1 is not uncommon. There is even a widely used approach, referred to as the pedigree approach and popularized by the ecoinvent database, of which the purpose is to estimate a probability distribution on limited data (Frischknecht et al.
2004; Weidema et al.
2013). We devote a longer discussion to this problem toward the end of this paper.