1 Introduction
2 Indicators for single product alternatives
2.1 General introduction
2.2 Informal conclusions
Product A | Product B | |
---|---|---|
Value ± uncertainty | \(50.27 \pm 10.41\) | \(60.22 \pm 11.85\) |
-
the outcome of the “deterministic” LCA (i.e., the LCA result without uncertainties);
-
the mean of the outcomes of the Monte Carlo series;
-
the median of the outcomes of the Monte Carlo series;
-
the geometric mean of the outcomes of the Monte Carlo series.
-
ranges (min–max) of the outcomes of the Monte Carlo series;
-
standard deviations of the outcomes of the Monte Carlo series;
-
\(2\) (or \(1.96\)) times the standard deviations of the outcomes of the Monte Carlo series;
-
geometric standard deviations of the outcomes of the Monte Carlo series;
-
squared geometric standard deviations of the outcomes of the Monte Carlo series;
-
percentile values (e.g., \({P}_{2.5}\) and \({P}_{97.5}\)) of the outcomes of the Monte Carlo series;
-
the standard error of the mean of the outcomes of the Monte Carlo series;
-
2 (or \(1.96\)) times the standard error of the mean of the outcomes of the Monte Carlo series.
2.3 Means, standard deviations, standard errors, and confidence intervals
Statistic | Product A | Product B |
---|---|---|
Mean (\(\stackrel{-}{a}\) and \(\stackrel{-}{b}\)) | \(50.27\) | \(60.22\) |
Standard deviation (\({s}_{\mathrm{A}}\) and \({s}_{\mathrm{B}}\)) | \(10.41\) | \(11.85\) |
Standard error of the mean (\({s}_{\stackrel{-}{a}}\) and \({s}_{\stackrel{-}{b}}\)) | \(0.33\) | \(0.37\) |
\(95\%\) confidence interval for the mean (\({\mathrm{CI}}_{{\upmu }_{\mathrm{A}},0.95}\) and \({\mathrm{CI}}_{{\upmu }_{\mathrm{B}},0.95}\)) | \(\left[\mathrm{49.63,50.92}\right]\) | \(\left[\mathrm{59.49,60.96}\right]\) |
2.4 Other centrality statistics
Statistic | Product A | Product B |
---|---|---|
Median (\({M}_{\mathrm{A}}\) and \({M}_{\mathrm{B}}\)) | \(49.46\) | \(59.33\) |
Geometric mean (\(G{M}_{\mathrm{A}}\) and \(G{M}_{\mathrm{B}}\)) | \(49.22\) | \(59.07\) |
2.5 Other uncertainty statistics
Statistic | Product A | Product B |
---|---|---|
Coefficient of variation (\({\mathrm{CV}}_{\mathrm{A}}\) and \({\mathrm{CV}}_{\mathrm{B}}\)) | \(0.21\) | \(0.20\) |
Geometric standard deviation (\({\mathrm{GSD}}_{\mathrm{A}}\) and \({\mathrm{GSD}}_{\mathrm{B}}\)) | \(1.23\) | \(1.22\) |
Interquartile range (\({\mathrm{IQR}}_{\mathrm{A}}\) and \({\mathrm{IQR}}_{\mathrm{B}}\)) | \(13.97\) | \(15.14\) |
\(95\%\) interpercentile interval (\(\left[{P}_{2.5,\mathrm{A}},{P}_{97.5,\mathrm{A}}\right]\) and \(\left[{P}_{2.5,\mathrm{B}},{P}_{97.5,\mathrm{B}}\right]\)) | \(\left[\mathrm{32.83,73.13}\right]\) | \(\left[\mathrm{39.18,85.53}\right]\) |
\(95\%\) interpercentile range (\({\mathrm{IPR}}_{95,\mathrm{A}}\) and \({\mathrm{IPR}}_{95,\mathrm{B}}\)) | \(40.31\) | \(46.35\) |
\(90\%\) uncertainty range (\({\mathrm{UR}}_{90,\mathrm{A}}\) and \({\mathrm{UR}}_{90,\mathrm{B}}\)) | \(1.99\) | \(1.88\) |
3 Indicators of difference for two product alternatives
3.1 General introduction
3.2 Null hypothesis significance testing
Statistic | Independent comparison | Dependent comparison |
---|---|---|
Value of \(t\)-statistic | \(- 19.95\) | \(- 64.40\) |
Degrees of freedom | \(1998\) | \(999\) |
Critical values of \(t\) | \(- 1.96\) and \(1.96\) | \(- 1.96\) and \(1.96\) |
\(p\) value | \(<{10}^{-6}\) | \(<{10}^{-6}\) |
-
The NHST procedure can never exclude the possibility that the null hypothesis of equal means has been incorrectly rejected. In fact, the chance of drawing this wrong conclusion is set by \(\alpha\) (5% in this example). Such an event is referred to a type I error.
-
The procedure may also fail to detect a difference. Depending on the test details, the probability of such a type II error (\(\beta\)) can be quite high.
-
The term “significant” may suggest to a quick reader that the difference is large or important. In fact, it is only jargon for the fact that NHST has established that there is (probably) a difference, but the difference can be small or large. Indeed, Vercalsteren et al. (2010) write that “the reusable PC cup has a significant more favorable environmental score than the one-way cups,” without a hypothesis test, demonstrating the danger inherent in the term “significant.”
-
After a rejection of the null hypothesis of equality of means, one can conclude that the means are not equal. But a decision-maker wants of course to know which product is then better. Therefore, a so-called post-hoc analysis is needed to find out more.
Statistic | Independent comparison | Dependent comparison |
---|---|---|
Sum of ranks (\({S}_{\mathrm{A}}\) or \({S}_{+}\)) | \(759626\) | \(22686\) |
Value of \(z\)-statistic | \(- 18.65\) | \(- 24.91\) |
Critical values of \(z\) | \(- 1.96\) and \(1.96\) | \(- 1.96\) and \(1.96\) |
\(p\) value | \(<{10}^{-6}\) | \(<{10}^{-6}\) |
3.3 The standardized mean difference
Statistic | Independent comparison | Dependent comparison |
---|---|---|
Cohen’s \(d\) | \(- 0.89\) | \(- 2.04\) |
Pearson’s \(r\) | \(- 0.41\) | \(- 0.90\) |
Fraction explained (\({r}^{2}\)) | \(0.21\) | \(0.81\) |
Value of \(t\)-statistic | \(- 14.10\) | \(- 64.40\) |
Critical values of \(t\) | \(- 1.96\) and \(1.96\) | \(- 1.96\) and \(1.96\) |
Degrees of freedom | \(998\) | \(999\) |
\(p\) value | \(<{10}^{-6}\) | \(<{10}^{-6}\) |
3.4 “Modified” null hypothesis significance testing
Statistic | Independent comparison | Independent comparison |
---|---|---|
Value of \(t\)-statistic | \(19.75\) | \(64.40\) |
Critical value of \(t\) | \(1.65\) | \(1.65\) |
Degrees of freedom | \(1998\) | \(999\) |
\(p\) value | \(<{10}^{-6}\) | \(<{10}^{-6}\) |
3.5 Nonoverlap statistics
Statistic | Independent comparison | Dependent comparison |
---|---|---|
Cohen’s \({U}_{1}\) | \(0.51\) | − |
Cohen’s \({U}_{2}\) | \(0.67\) | − |
Cohen’s \({U}_{3}\) | \(0.81\) | − |
Grice and Barrett (\({U}_{1}^{^{\prime}}\)) | \(0.34\) | − |
McGraw and Wong (\(\mathrm{CLES})\) | \(0.74\) | − |
3.6 Other nonoverlap statistics
3.7 Comparison indicator and discernibility analysis
Statistic | Independent comparison | Dependent comparison |
---|---|---|
probability that \(CI>1\) | \(0.98\) | \(0.74\) |
discernibility (\(K\)) | \(0.98\) | \(0.74\) |
3.8 Other measures of superiority
4 More than two product alternatives
Statistic | Independent comparison | Dependent comparison |
---|---|---|
Probability of superiority of A w.r.t. B (\({K}_{2,\mathrm{A}}\)) | \(0.74\) | − |
Probability of superiority of B w.r.t. A (\({K}_{2,\mathrm{B}}\)) | \(0.26\) | − |
Probability of superiority of A w.r.t. the median B (\({K}_{3,\mathrm{A}}\)) | \(0.82\) | − |
Probability of superiority of B w.r.t. the median A (\({K}_{3,\mathrm{B}}\)) | \(0.18\) | − |
-
For some of the statistics, there is a situation of symmetry, so the cell with “X ↔ Y” contains the same result as “Y ↔ X.” This is, for instance, the case for \(p\)-values from NHST.
-
For other statistics, the two cells contain exactly opposite (antisymmetric) information. For instance, for Cohen’s \(d\) we have that “X ↔ Y” has the same numerical result as “Y ↔ X,” except for a minus sign.
-
Still other statistics have complementing information. For instance, for the probability of superiority, the values in “X ↔ Y” and “Y ↔ X” add to exactly \(1\).
-
The most interesting situation occurs for those statistics for which “X ↔ Y” and “Y ↔ X” are not uniquely related, so for which the second cell contains information that we could not guess from the first cell alone. This is, for instance, the case for the modified NHST (see also Mendoza Beltrán et al. 2018).
5 Discussion
-
If the answer is around 50%, the two products are indifferent, and tossing a coin is equally reliable as using the LCA result.
-
If the answer is much less than 50%, we should choose product B.
-
If the answer is much more than 50%, we should choose product A.
-
If the answer is just a bit, it is questionable if we should invest the money and time to switch.
-
If the answer is that it matters a lot, a choice will pay off.
product A | product B | product C | product D | |
---|---|---|---|---|
product A | − | A ↔ B | A ↔ C | A ↔ D |
product B | B ↔ A | − | B ↔ C | B ↔ D |
product C | C ↔ A | C ↔ B | − | C ↔ D |
product D | D ↔ A | D ↔ B | D ↔ C | − |
What is the probability that a randomly selected specimen of product A performs better than a randomly selected specimen of product B? | ||||
---|---|---|---|---|
low | \(\approx 50\boldsymbol{\%}\) | high | ||
How much will a randomly selected specimen of product A perform better than a randomly selected specimen of product B? | a bit | questionable | never mind | questionable |
a lot | choose B | questionable | choose A |
Answers the question | ||
---|---|---|
Statistic | What is the probability that a randomly selected specimen of product A performs better than a randomly selected specimen of product B? | How much will a randomly selected specimen of product A perform better than a randomly selected specimen of product B? |
difference in mean or median | no | yes |
NHST with \(t\)-test or Wilcoxon-Mann–Whitney test | no | no |
modified NHST | no | no |
Cohen’s \(d\), Pearson’s \(r\) | no | yes |
nonoverlap statistics (\({U}_{1}\), \({U}_{2}\), \({U}_{3}\), \(CLES\)) | yes | no |
Bhattacharyya coefficient and overlapping coefficient | yes | no |
comparison index and discernibility | yes | no |
superiority (\({K}_{2}\), \({K}_{3}\)) | yes | no |
modified comparison index (\({K}_{4}\); see below) | yes | yes |
-
the threshold value of the comparison index (\({\gamma }_{0}\));
-
the minimum probability of beating an inferior product alternative (is \(51\%\) enough?);
-
the maximum probability of being beaten by an inferior product alternative (is \(10\%\) acceptable?).
Independent comparison | Dependent comparison | |
---|---|---|
Probability of threshold superiority of A (\({K}_{4,\mathrm{A}}\)) |
\(0.51\)
|
\(0.49\)
|
Probability of threshold superiority of B (\({K}_{4,\mathrm{B}}\)) |
\(0.10\)
|
\(0.00\)
|
-
it works on the empirical distribution of results, so no assumptions about normality, lognormality, or other distributions are needed;
-
it does not require to make a choice for a specific centrality or dispersion statistic, such as the mean and the standard deviation, but it employs the full distribution;
-
it can be easily generalized to comparisons of three or more product alternatives;
-
it is easy to implement in software;
-
the procedure is not sensitive to \(p\)-hacking;
-
the results are easy to grasp for less statistically trained people.