Thumbnail sketches of three case studies serve as a template for discussion of probabilistic dependence issues discussed above: examples of the USGS approach to probabilistic dependencies among oil and gas assessment units, the USGS probabilistic assessment of CO2 sequestration in mature oil and gas reservoirs in the United States and a Canadian Geological Survey study of use of cupolas to capture probabilistic dependencies among accumulations in individual oil and gas plays.
5.3.1 USGS Oil and Gas Resource Projections
The USGS developed an assessment system in the 1980s with the acronym FASP (fast appraisal system for petroleum resources). FASP incorporated perfect positive correlation between micro-level reservoir attributes but allowed specification of any positive correlation in the course of aggregating play resources. However, the USGS 2000 World Petroleum Assessment aggregates undiscovered resource volumes from assessment unit level to regional level using perfect correlation as the argument for adding assessment unit fractiles to arrive at regional level aggregates. Recognizing that at the global level dependencies among large regional aggregates of resources are unlikely to be perfectly correlated they adopt pairwise correlation of 0.5 between pairs of eight regions (Klett et al.
2000). No sensitivity analysis of how aggregate projections vary with these particular choices is provided.
Many USGS assessment studies present tables of fractiles of individual assessment units and then add them to arrive at a fractile assessment of total resources. Addition is qualified by the statement that “Fractiles are additive under assumption of perfect positive correlation” allowing avoidance of direct assessment of dependencies among units. Table 2 in “Assessment of Undiscovered Continuous Oil and Gas Resources in the Monterey Formation, San Joaquin Basin Province, California” USGS Fact Sheet 2015-3058 September 2015 and Table 2 in USGS Fact Sheet 2014–3082 “Assessment of Potential Shale-Oil and Shale-Gas Resources in Silurian shales of Jordan” September 2014 are examples. Chen et al. (
2012) cite additional examples (Klett et al.
2000,
2005; Klett
2004). It is easy to show that “perfect correlation” is not robust to variations in specification of the functional form of marginal distributions elicited from geologists. Worse, addition of fractiles without careful attention to properties of the joint distribution of a set of uncertain quantities can lead to incoherence. On the other hand mutual independence allows specification of arbitrary marginal probability distributions without doing violence to coherence but often leads to an unacceptably narrow probability projection of sums of oil and gas magnitudes.
A salient feature of Pearson’s correlation coefficient is that random variables
\( X\,{\text{and}}\,Y \) possess correlation
\( 1.0 \) or
\( - 1.0 \) only if
\( X\,{\text{and}}\,Y \) are linearly dependent. As Denuit and Dehaene (
2003) point out, a limiting case is a bivariate normal pair of random variables for which the variance of one member of the pair is zero. If
\( X\,{\text{and}}\,Y \) are jointly lognormal and
\( \log X\, \) is a linear function of
\( \log Y \) the Pearson correlation of
\( \log X\, \) and
\( \log Y\, \) is either 1.0 or −1.0. However, the Pearson correlation of
\( X\,{\text{and}}\,Y \) is then less than 1.0. Denuit and Dehaene provide a more nuanced treatment. Suppose
\( F_{1} \,{\text{and}}\,F_{2} \) are marginal cumulative distribution functions of
\( X\,{\text{and}}\,Y \) respectively, each concentrated on
\( (0,\infty ) \) and
\( U \) is a uniform random variable independent of
\( X\,{\text{and}}\,Y \). Using super-modularity these authors prove that if
\( F_{1} \,{\text{and}}\,F_{2} \) lie in a Fréchet space the Pearson correlation coefficient
\( r(X, Y) \) of
\( X\,{\text{and}}\,Y \) is bounded by
$$ \frac{{Cov(F_{1}^{ - 1} (U),F_{2}^{ - 1} (1 - U))}}{{\sqrt {Var(X)} \sqrt {Var(Y)} }} \le r(X,Y) \le \frac{{Cov(F_{1}^{ - 1} (U),F_{2}^{ - 1} (U))}}{{\sqrt {Var(X)} \sqrt {Var(Y)} }}. $$
(5.5)
In this setting perfect correlation is not achievable. They also prove that it is possible for a pair of co-monotonic lognormal random variables to have pairwise correlation close to zero, contradicting the intuitive notion that small correlation implies weak dependence. Denuit and Dehane call attention to Shih and Huang (
1992) and Schechtman and Yitzhaki’s (
1999) observation that, for any two random variables, the achievable range of Pearson’s correlation coefficient is (−1, 1) only if the functional form of the two marginal distributions differ solely in values of location and/or scale parameters. If not, the range of Pearson’s r is narrower than (−1, 1) and depends on the shape of the two marginal distributions.
These authors document several important features of Kendall’s \( \tau \) and Spearman’s \( \rho \). (Spearman’s \( \rho \) is at the center of the Iman and Conover method deployed in the USGS (2013) study of \( CO_{2} \) sequestration to compute predictive probability distributions of aggregates). First, both are invariant with respect to strictly monotone transformations. Second, when one variable is a non-decreasing (non-increasing) transformation of the other they equal 1 (or −1) at the Fréchet upper (resp. lower) bound. They note that at a value of 1.0 or −1.0 Kendall’s \( \tau \) and Spearman’s \( \rho \) achieve Fréchet bounds. According to them Kendall’s \( \tau \) and Spearman’s \( \rho \) are more desirable measures of association for non-normal multivariate distributions than Pearson’s \( r \) because the latter does not share Kendall and Spearman’s correlation invariance properties. These invariance properties come into play in Iman and Conover’s method discussed below. Denuit and Dehane prove the non-obvious fact that if positively or negatively quadrant dependent random couples are jointly uncorrelated they are mutually independent.
All of this emphasizes that “perfect correlation” as an omnibus argument for adding fractiles has many pitfalls. Co-monotonic bounds on random sums are a conceptually satisfactory alternative that deserves much future study.
5.3.2 USGS Probabilistic Assessment of CO2 Storage Capacity
A recent USGS probabilistic assessment of
\( CO_{2} \) sequestration in mature petroleum reservoirs (Blondes et al.
2013a,
b) is based on both micro- and macro-assessments by geologists. Their macro-assessment aggregates storage assessment units (SAUs) at basin, regional and national levels. An objective was to provide probabilistic assessments that take into account dependencies among assessment units arising from “overlap of geologic analogs, assessment methods and assessors” using individual SAU marginal probability distributions and “…a correlation matrix obtained by expert elicitation describing interdependencies between pairs of SAUs”. The correlation matrix dimension is
\( 192 \times 192 \). Because a menagerie of marginal distributions—Beta-PERT, lognormal, truncated lognormal—were deployed at the micro-level use of standard multivariate distribution theory is not appropriate. Dependencies among storage capacity magnitudes are induced using an innovative distribution free method developed by Iman and Conover (
1982) that allows marginal distribution shapes to be estimated from data sets distinct from data sets used to estimate dependency structure. Their method is designed to provide rank correlations that match assessed correlations and to translate the match into a predictive probability distributions for individual assessment units and larger aggregates. (See Blondes et al.
2013a for informative examples).
How to aggregate from basin, to region and then to a national scale is an issue. Should this be done in a single stage using the correlation matrix for all SAUs in the study or successively aggregate subsets of SAUs in multiple stages? Blondes et al. (
2013b) conclude that
Although the single-stage approach requires determination of significantly more correlation coefficients, it captures geologic dependencies among similar units in different basins and it is less sensitive to fluctuations in low correlation coefficients than the multiple stage approach. Thus, subsets of one single-stage correlation matrix are used to aggregate to basin, regional, and national scales.
Successive aggregation in multiple stages drastically reduces the number of pairwise correlations that must be elicited from geologists at the expense of requiring each assessor to appraise pairwise correlations of sums of assessment unit magnitudes. Although there are no studies comparing how well geologists’ assessments calibrate when asked to appraise dependencies among sums of SAU magnitudes relative to appraisal of dependencies among individual SAUs it is reasonable to conjecture that individual SAU appraisals are much more likely to be well calibrated. Properties of single and multi-stage appraisal methods are studied in Kaufman et al. (
2018).
5.3.3 Cupolas and Oil and Gas Resource Assessment
Chen et al. (
2012) emphasize that at an assessment micro-level, reservoir attributes such as porosity, permeability, pressure and temperature are often decisively dependent and that empirical data suggest dependencies are present among more aggregate assessment units in mature provinces—among fields in a mature play or basin for example. Their argument is that a basin’s tectonic framework exerts “strong geographic control” over many geological features and leads to geographic and spatial dependencies and that because plays in a given basin share “…petroleum system elements, such as source rocks, regional top seal, migration fairways, timing, regional tectonics for trap formation, and accumulation preservation factors” a probabilistic model of pools or fields in a play in a given basin should incorporate probabilistic dependencies among these attributes as well as between plays. They are the first to use copulas in this setting.
Sklar (
1959) proved that, subject to mild restrictions a multivariate cumulative distribution can be mapped into a joint cumulative distribution of uniform random variables called a cupola. As with Iman and Conover’s method, adoption of a cupola model allows marginal distribution shapes to be estimated from data sets distinct from those used to estimate dependency structure.
Suppose as in Sect.
5.2 above that
\( F_{X} \) is the distribution function of a random vector
\( {\mathbf{X}} = (X_{1} ,\ldots,X_{n} )^{t} \) with domain
\( {\mathbf{R}}^{n} \) and marginal cumulative distributions
\( F_{i} ,i = 1,\ldots,n. \) Let
\( {\mathbf{U}}_{n} = (U_{1} ,\ldots,U_{n} ) \) be a vector of independent uniform
\( (0,1) \) random variables and
\( {\mathbf{u}}_{n} = (u_{1} , \ldots ,u_{n} ) \) be a realization of
\( {\mathbf{U}}_{n} \). Then with
\( u_{i} = F_{i} (x_{i} )\,,i = 1,\ldots n \) \( Prob\{ X_{1} \le x_{1} ,\ldots,X_{n} \le x_{n} \} = Prob\{ U_{1} \le u_{1} ,\ldots,U_{n} \le u_{n} \} \).
Set \( dF_{i} = f_{i} \,,\,i = 1,\ldots,n \) and \( dC(u_{1} ,\ldots,u_{n} ) = c(u_{1} ,\ldots ,u_{n} )du_{1} \ldots du_{n} . \) The joint density of \( {\mathbf{X}} \) can be written as \( c(u_{1} ,\ldots,u_{n} ) \times f_{1} (x_{1} ) \times \ldots \times f_{n} (x_{n} ) \). The term \( c \) in the joint density captures the dependency structure of elements of \( {\mathbf{X}} \). Because \( Prob\{ X_{1} \le x_{1} ,\ldots,X_{n} \le x_{n} \} = Prob\{ U_{1} \le u_{1} ,\ldots,U_{n} \le u_{n} \} \) a procedure for generating samples from \( C \) produces samples of \( {\mathbf{X}} \) by inversion of \( u_{i} = F_{i} (x_{i} )\,,i = 1,\ldots n \).
Computation requires choice of a cupola functional form. Among a variety of choices Chen et al. chose the bivariate normal cupola, a popular choice closely tied to standard multivariate normal distribution theory.
Their regional resource assessment of the Canadian Arctic’s Beaufort-McKenzie Basin is based on analysis of 48 “significant” oil and gas discoveries containing 53 distinct accumulations. Empirical data is sufficiently detailed to allow study and estimation of pairwise correlations among reservoir attributes—area, porosity, oil saturation, net pay—for plays in the three major petroleum systems. The authors treat geologic risk factors as probabilistically independent because the data is not sufficient to allow empirical estimation of them and restrict their study of dependencies to reservoir volume attributes within each play and through them to the impact of probabilistic dependencies on the distribution of total resource volumes.
Four plays, Ivik, Taglu, Kugmallit (East) and Kugmallit (West) are used to illustrate how to incorporate dependencies among individual play resources. Although no systematic method for eliciting geologists’ judgments about between play dependencies are discussed the authors motivate their choice of a rather large correlations between plays (0.6) and perfect correlation (1.0) by noting that all four plays share the same source rock and petroleum system: “The resource richness of each play is basically a function of both the oil charge and the preservation of accumulations that are mostly controlled by common petroleum system elements… we infer that the resources in the four plays are highly correlated, although the pool size distributions among the four plays vary considerably.” Pairwise correlations between area, net pay, porosity and oil saturation vary from a low of 0.20 to a high of 0.86. The authors call attention to the substantial difference between total ultimate oil resource medians under the assumption of independence and under the assumption of within and between play correlations: the latter is 1.6 times the former.
Principal messages are that to be realistic, probabilistic appraisal of oil and gas resources in unexplored and partially explored regions must account for multiple sources of dependencies and that cupolas are useful for doing so.