Abstract
The U.S. Geological Survey is currently conducting a national assessment of carbon dioxide (CO2) storage resources, mandated by the Energy Independence and Security Act of 2007. Pre-emission capture and storage of CO2 in subsurface saline formations is one potential method to reduce greenhouse gas emissions and the negative impact of global climate change. Like many large-scale resource assessments, the area under investigation is split into smaller, more manageable storage assessment units (SAUs), which must be aggregated with correctly propagated uncertainty to the basin, regional, and national scales. The aggregation methodology requires two types of data: marginal probability distributions of storage resource for each SAU, and a correlation matrix obtained by expert elicitation describing interdependencies between pairs of SAUs. Dependencies arise because geologic analogs, assessment methods, and assessors often overlap. The correlation matrix is used to induce rank correlation, using a Cholesky decomposition, among the empirical marginal distributions representing individually assessed SAUs. This manuscript presents a probabilistic aggregation method tailored to the correlations and dependencies inherent to a CO2 storage assessment. Aggregation results must be presented at the basin, regional, and national scales. A single stage approach, in which one large correlation matrix is defined and subsets are used for different scales, is compared to a multiple stage approach, in which new correlation matrices are created to aggregate intermediate results. Although the single-stage approach requires determination of significantly more correlation coefficients, it captures geologic dependencies among similar units in different basins and it is less sensitive to fluctuations in low correlation coefficients than the multiple stage approach. Thus, subsets of one single-stage correlation matrix are used to aggregate to basin, regional, and national scales.
Similar content being viewed by others
References
Benson SM, Cook P (2005) Underground geological storage. In: IPCC Special Report on carbon dioxide capture and storage, chap 5. Intergovernmental panel on climate change. Cambridge University Press, Cambridge
Blondes MS, Brennan ST, Merrill MD, Buursink ML, Warwick PD, Cahan SM, Cook TA, Corum MD, Craddock WH, DeVera CA, Drake RM, Drew LJ, Freeman PA, Lohr CD, Olea RA, Roberts-Ashby TL, Slucher ER, Varela BA (2013) National assessment of geologic carbon dioxide storage resources—methodology implementation: U.S. Geological Survey Open-File Report 2013-1055
Blondes MS, Schuenemeyer JH, Drew LJ, Warwick PD (in press) Probabilistic aggregation of individual assessment units in the U.S. Geological Survey national CO2 sequestration assessment. Energy Procedia
Brennan ST, Burruss RC, Merrill MD, Freeman PA, Ruppert LF (2010) A probabilistic assessment methodology for the evaluation of geologic carbon dioxide. U.S. Geological Survey Open File Report 2010-1127
Carter PJ, Morales E (1998) Probabilistic addition of gas reserves within a major gas project. In: Society of Petroleum Engineers Asia Pacific oil and gas conference and exhibition, paper #50113
Chen Z, Osadetz KG, Dixon J, Dietrich J (2012) Using copulas for implementation of variable dependencies in petroleum resource assessment: example from Beaufort–Mackenzie Basin, Canada. AAPG Bull 96(3):439–457
Crovelli RA (1985) Comparative-study of aggregations under different dependency assumptions for assessment of undiscovered recoverable oil resources in the world. J Int Ass Math Geol 17(4):367–374
Daneshkhah A, Oakley JE (2010) Eliciting multivariate probability distributions. In: Böcker K (ed) Rethinking risk measurement and reporting, vol 1. Risk Books, London
Delfiner P, Barrier R (2008) Partial probabilistic addition: a practical approach for aggregating gas resources. SPE Reservoir Eval Eng 11:379–385
Elton EJ, Gruber MJ (1995) Modern portfolio theory and investment analysis. Wiley, New York
Energy Independence and Security Act of 2007. United States Public Law 110–140, Section 711
Frigessi A, Løland A, Pievatolo A, Ruggeri F (2010) Statistical rehabilitation of improper correlation matrices. Quant Finance 11(7):1081–1090
Higham NJ (2002) Computing the nearest correlation matrix—a problem from finance. IMA J Numer Anal 22(3):329–343
Kaufman GM, Faith RE, Schuenemeyer JH (2013) Predictive probability distributions for petroleum unit resource projections via hierarchical modeling. MIT Sloan Research Paper No. 4981-12
Markowitz H (1952) Portfolio selection. J Finance 7:77–91
Meyer MA, Booker JM (2001) Eliciting and analyzing expert judgement, a practical guide. ASA-SIAM Series on Statistics and Applied Probability, Alexandria, VA
Numpacharoen K, Bunwong K (2012) An intuitively valid algorithm for adjusting the correlation matrix in risk management and option pricing. Social Science Research Network. http://ssrn.com/abstract=1980761
Olea R (2011) On the use of the beta distribution in probabilistic resource assessments. Nat Resour Res 20:377–388
Pike R (2008) How much oil is really there? Making correct statistics bring reality to global planning. Significance 5:149–152
R Core Team (2012) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna
Schuenemeyer JH (2005) Methodology for the 2005 USGS assessment of undiscovered oil and gas resources, Central North Slope, Alaska. U.S. Geological Survey Open File Report 2005-1410
Schuenemeyer J, Gautier D (2010) Aggregation methodology for the circum-arctic resource appraisal. Math Geosci 42(5):583–594
U.S. Geological Survey (1995) 1995 National Assessment of United States Oil and Gas Resources. U.S. Geological Survey Circular 1118
U.S. Geological Survey (1998) 1998 Assessment of Undiscovered Deposits of Gold, Silver, Copper, Lead, and Zinc in the United States. U.S. Geological Survey Circular 1178
U.S. Geological Survey Geologic Carbon Dioxide Storage Resources Assessment Team (in press) National assessment of geologic carbon dioxide storage resources—results. U.S. Geological Survey Circular 1386
Van Elk JF, Gupta R (2010) Probabilistic aggregation of oil and gas field resource estimates and project portfolio analysis. SPE Reservoir Eval Eng 13:72–81
Watkins DS (2010) Fundamentals of matrix computations. Wiley, New York
Acknowledgments
The USGS supported this work as part of the USGS Geologic Carbon Dioxide Storage Resources Assessment. We thank three anonymous reviewers and Emil Attanasi and Gordon Kaufman internal USGS reviews.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1: R Code description and example
The aggregation routines are written in R, which is an open source code language. The latest version may be found at www.r-project.org. The input files used in this example are assumed to be in csv format.
1.1 Introduction
There are two input data sets:
-
1.
SAUs simulation runs. These are in a t × n matrix of independently generated results, where t is the number of simulations (trials) and n is the number of SAUs to be aggregated.
-
2.
User specified pairwise n × n “correlation” matrix
There are three output data sets:
-
1.
A t × n matrix of sample numbers needed to induce the user specified correlation. This is an intermediate data set.
-
2.
A t × (n + 1) matrix where the first column is the aggregate sum of the n SAU values for the t trials. The remaining n rows are the corresponding sample numbers.
-
3.
A vector of summary statistics.
There are four R-programs (functions):
-
1.
matrixp.fn checks to see if the user specified pairwise correlation matrix is a symmetric, positive semidefinite and a proper correlation matrix. If it is not a proper correlation matrix, the closest correlation matrix using the Frobenius norm is computed using the method of Higham (2002). The resultant proper correlation matrix is written as a csv file.
-
2.
sampnum.fn generates sample numbers using a Cholesky decomposition of the proper correlation matrix to induce the specified correlation among SAUs.
-
3.
COagg.fn aggregates the simulation data using the user specified or adjusted correlation matrix.
-
4.
COaggsum.fn creates summary statistics from the COagg.fn output.
1.2 Small examples
Two examples are presented. Both use three SAUs, called V1–V3. The first has a correlation matrix Mt1cor.csv, which is positive semidefinite; the second, Mt2cor.csv, is not. Both use the same data set, respectively called Mt1dat.csv and Mt2dat.csv consisting of t = 10 trials.
1.2.1 Example: Mt1
Input data files:
File: Mt1dat.csv and Mt2dat.csv
V1 | V2 | V3 |
---|---|---|
54 | 375 | 43 |
63 | 546 | 49 |
28 | 468 | 61 |
99 | 422 | 55 |
87 | 336 | 22 |
86 | 307 | 33 |
45 | 296 | 29 |
26 | 186 | 41 |
22 | 418 | 29 |
20 | 416 | 51 |
File: Mt1cor.csv
V1 | V2 | V3 |
---|---|---|
1 | 0.9 | 0.2 |
0.9 | 1 | 0.3 |
0.2 | 0.3 | 1 |
The functions are executed as follows:
matrixp.fn(Mt1)
[1] “Fnorm = ” “0” ” Max abs diff = ” “0”
The above result implies that the correlation matrix Mt1cor is positive semidefinite. No adjustment was needed, however, a new matrix Mt1sq.csv, identical to Mt1cor.csv was written to the user’s directory for use by function sampnum.fn.
sampnum.fn(Mt1,23,10)
Note that 23 above is a random number seed and t = 10 is the number of trials; the default number of trials is 10,000. If this program executes correctly there is no output in R, however the following sample number file, Mt1sn.csv is written to the user’s directory.
File: Mt1sn.csv
V1 | V2 | V3 |
---|---|---|
4 | 5 | 6 |
1 | 3 | 2 |
2 | 2 | 1 |
5 | 4 | 8 |
6 | 7 | 4 |
3 | 1 | 5 |
8 | 6 | 7 |
9 | 8 | 10 |
7 | 9 | 3 |
10 | 10 | 9 |
COagg.fn(Mt1,2)
Note that 2 above implies using user specified correlation. Other options are to specify independence (1) or fractile additivity (3). Fractile additivity assumes a large sample size. If this program executes correctly there is no output in R, however, the following aggregation and sample number file Mt1ResC.csv is written to the user’s directory. Note that C in the file name refers to the user specified correlation matrix.
File: Mt1ResC.csv
Sum | sn1 | sn2 | sn3 |
---|---|---|---|
446 | 4 | 5 | 6 |
356 | 1 | 3 | 2 |
340 | 2 | 2 | 1 |
432 | 5 | 4 | 8 |
505 | 6 | 7 | 4 |
253 | 3 | 1 | 5 |
551 | 8 | 6 | 7 |
570 | 9 | 8 | 10 |
560 | 7 | 9 | 3 |
700 | 10 | 10 | 9 |
COaggsum.fn(Mt1ResC)
If this program executes correctly there is no output in R, however the following summary statistics file Mt1ResCsum.csv is written to the user’s directory and is shown below.
Mt1ResCsum.csv
Value | |
---|---|
min | 253 |
P05 | 292 |
P25 | 375 |
P50 | 476 |
P75 | 558 |
P95 | 642 |
Max | 700 |
Mean | 471 |
Std dev | 132 |
n | 10 |
1.2.2 Example: Mt2
The major difference between this and the previous (Mt1) example is that Mt2cor is not a positive semidefinite matrix. Thus only this part will be illustrated. The simulation results Mt2dat are the same as Mt1dat. The correlation matrix Mt2cor.csv is:
Mt2cor.csv
V1 | V2 | V3 |
---|---|---|
1 | 0.9 | −0.3 |
0.9 | 1 | 0.3 |
−0.3 | 0.3 | 1 |
matrixp.fn(Mt2)
[1] “Fnorm = ” “0.08792” “ Max abs diff = ” “0.04884”
A non-zero Fnorm implies that the input correlation matrix is not positive semidefinite. A new proper correlation matrix Mt2sq.csv is written to the user’s directory for use by function sampnum.fn. It is shown below.
File: Mt2sq.cov
V1 | V2 | V3 |
---|---|---|
1 | 0.85 | −0.27 |
0.85 | 1 | 0.27 |
−0.27 | 0.27 | 1 |
The R-code to execute these functions is given in text file: Appendix 2.
Appendix 2: R Code
Rights and permissions
About this article
Cite this article
Blondes, M.S., Schuenemeyer, J.H., Olea, R.A. et al. Aggregation of carbon dioxide sequestration storage assessment units. Stoch Environ Res Risk Assess 27, 1839–1859 (2013). https://doi.org/10.1007/s00477-013-0718-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00477-013-0718-x