nach oben

Erschienen in:

Open Access 2018 | OriginalPaper | Buchkapitel

15. Statistical Parametric Mapping for Geoscience Applications

verfasst von : Sean A. McKenna

Erschienen in: Handbook of Mathematical Geosciences

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Patentsuche

Aus

Abstract

Spatial fields represent a common representation of continuous geoscience and environmental variables. Examples include permeability, porosity, mineral content, contaminant levels, seismic impedance, elevation, and reflectance/absorption in satellite imagery. Identifying differences between spatial fields is often of interest as those differences may represent key indicators of change. Defining a significant difference is often problem specific, but generally includes some measure of both the magnitude and the spatial extent of the difference. This chapter demonstrates a set of techniques available for the detection of anomalies in difference maps represented as multivariate spatial fields. The multiGaussian model is used as a model of spatially distributed error and several techniques based on the Euler characteristic are employed to define the significance of the number and size of excursion sets in the truncated multiGaussian field. This review draws heavily on developments made in the field of functional magnetic resonance imaging (fMRI) and applies them to several examples motivated by environmental and geoscience problems.

15.1 Introduction

A general problem in geological and environmental investigations is rapid and accurate identification of anomalous measurements from one, two or three-dimensional data. Example applications include cluster identification in spatial point processes (e.g., Byers and Raftery 1998; Cressie and Collins 2001) detection of anomalies in remotely sensed imagery (e.g., Stein et al. 2002) and identification of anomalous clusters in lattice data (e.g. Goovaerts 2009). The problem of anomaly detection is complicated when the data set is composed of more than a handful of variables (multi-variate) and becomes even more complex when the multiple variables comprise a random field exhibiting spatial correlation.

The temporal and/or spatial correlation of the data rules out the application of standard statistical tests for change detection and has also limited the development of hypothesis testing techniques for correlated data (Gilbert 1987). For applications with correlated data, simulation techniques can often be used to develop the null distribution, but development of closed form hypothesis tests for analysis of the spatial random fields associated with geostatistics has remained sparse.

One approach to detection of anomalies in spatially correlated data are Local Indicators of Spatial Association (LISA) statistics (Anselin 1995; Goovaerts et al. 2005; Goovaerts 2009). These tests focus on the local relationships between adjacent cells and explore combinations of cells defined with an adjacency matrix and or a moving window visiting all cells in a lattice. A very different approach is to model the difference between images as a continuous random field and use properties of an underlying random field model to identify anomalies.

Change detection in spatial-temporal data sets has received considerable attention over the past 15–20 years within the medical imaging research community (Brett et al. 2003; Friston et al. 1994, 1995; Worsley et al. 1992, 1996) and a significant development of this research has been Statistical Parametric Mapping (SPM).

The practice of statistical parametric mapping has been developed in the field of medical imaging, particularly in brain imaging, and in the practice of functional magnetic resonance imaging (fMRI) of the brain while the subject is performing various tasks (functions). Friston et al. (1995, p. 190) provide a concise definition of SPM: “one proceeds by analyzing each voxel using any (univariate) statistical parametric test. The resulting statistics are assembled into an image, that is then interpreted as a spatially extended statistical process”. In other words, at each pixel (voxel) in an image, a univariate statistical test (e.g., t-test) is applied and the resulting values of the test statistic at each pixel are then displayed as a map. The underlying spatial correlation of the map is used in creating a multivariate statistical model that describes that map and this model can be used for inference. Typically, the resulting map is analyzed using theory that underlies stationary Gaussian fields and techniques developed for excursion sets of these fields. Properties of truncated Gaussian fields (e.g., Adler and Hasofer 1976; Adler 1981; Adler and Taylor 2007; Adler et al. 2009) serve as the basis of the SPM techniques.

To date, the SPM approach has not been applied outside of medical imaging, but it appears to be a technique that could be successfully applied in a number of areas of interest in the earth and environmental sciences. The goals of this work are to both describe the basis of SPM and then apply SPM to example problems.

15.2 Anomaly Detection with Statistical Parametric Mapping

Anomaly detection is defined here as the identification of a region in time and/or space that is anomalous in its shape, size (duration) and/or values within the region (intensity). Two modes to anomaly detection in spatial-temporal data sets can be defined: (1) Anomaly detection in an online mode where prior data are used to predict future values of the measured variable and anomalies occur in areas and/or times where the predictions are inconsistent with the corresponding measurements; (2) Anomaly detection as the difference between two classes of data where differences in some treatment or external forcing condition is suspected to cause a difference in the measured variable. The anomalies in this case are significant differences in measured variables observed with and without activation of the external condition. This latter case is the focus of the work in this chapter.

Specifically, an ensemble of geologic models can be created in 1, 2 or 3 dimensions where each member of the ensemble is associated with a specific “treatment” or “result” that can be used to group ensemble members into separate classes. As examples:

An ensemble of 3D geostatistical realizations of porosity can be created conditioned to a single set of observations where two different variogram models, both of which fit the available data equally well, are used. The different variogram models constitute a “treatment” and the question arises as to whether or not the treatments create significant differences (anomalies) in the resulting realizations and where, spatially, those differences occur.
Petrophysical logs from different wells intersecting the same reservoir constitute an ensemble of 1-D measurements. When split into groups based on the result of which wells produced a threshold amount of petroleum and which did not, the question arises as to whether or not the petrophysical log profiles are significantly different between the groups? If they are, what portions of the log create this significant difference?

Two measures of anomaly detection can be employed: omnibus and localized (Worsley et al. 1992). Omnibus detection uses a set of calculations to determine if the current curve, map or volume, taken as a whole, is anomalous. Localized detection determines the specific location(s) within the study domain where the anomaly occurs and are the focus of this work.

Anomaly detection is not done directly on the observed generated or observed ensemble members, but on a difference between groups of members as defined through the treatment or result. Here, the differences are calculated as the differences of two average values. The averages are calculated at each point, pixel or voxel within the domain using standard univariate statistical tests (e.g., t-test). Each pixel-wise average is calculated over a set of ensemble members created under a specific condition (treatment) or generating a specific result. For example, in studies of the human brain, images are often collected under “resting” and “stimulated” conditions and the average image from each condition is then used to create a difference map.

SPM was developed to directly address the problem of spatial correlation in statistical testing. Direct application of most statistical tests requires independence of the observations, but for many problems, including those studied here, correlation between adjacent observations is the norm. Therefore, the results of the statistical tests for adjacent, or even nearby, pixels cannot be effectively evaluated using standard techniques. SPM considers a single map comprised of the results of all local (pixel-wise) statistical tests and provides several measures for comparison of the values in the map to critical threshold levels.

15.2.1 MultiGaussian Fields

The basis of the SPM approach is the analysis of the number, size and degree of excursions from a multiGaussian (mG) random field. For a concise, statistical description of mG fields, see Adler et al. (2009, p. 27). Stationary multiGaussian fields are fully defined by a mean and covariance matrix. In a practical sense, values at each pixel are defined with a Gaussian distribution. The correlation between those multiple distributions is defined by the covariance. Spatial correlation can be added to an uncorrelated field through the convolution of a smoothing kernel with an uncorrelated (white noise) field. As an example, the 2D Gaussian kernel is defined as

$$ {\text{G(x,y)}} = \frac{1}{{2\pi \left|\Sigma \right|^{1/2} }}exp\left( { - \frac{1}{2}d\Sigma ^{ - 1} {\text{d}}^{\text{T}} } \right) $$

where d is the distance vector containing distances d_x and d_y from any location (x, y) to the origin of the Gaussian function x₀, y₀ (here (0, 0) for the standard normal distribution). In this work, the covariance matrix, Σ = σ²I, (I = identity matrix) is diagonal for the specific case of the kernel being aligned with the grid axes.

An often-used measure of the spatial bandwidth of a smoothing kernel in the image processing literature is the “full width at half maximum” (FWHM). For the Gaussian kernel above, the FWHM is:

$$ FWHM = \sigma \sqrt {8{ \ln }(2)} $$

If the mG field is not created, but is obtained from some type of imagery or other analyses, then there is no known underlying kernel and it is necessary to estimate the FWHM directly from the image. Estimation can be done using the covariance matrix of the partial derivatives of the image values, T, with respect to the discretization of the image. In 2D, the covariance matrix is:

$$ \Lambda = \left[ {\begin{array}{*{20}c} {Var\left( {\frac{\partial T}{\partial x}} \right)} & {Cov\left( {\frac{\partial T}{\partial x},\frac{\partial T}{\partial y}} \right)} \\ {Cov\left( {\frac{\partial T}{\partial x},\frac{\partial T}{\partial y}} \right)} & {Var\left( {\frac{\partial T}{\partial y}} \right)} \\ \end{array} } \right] $$

This covariance matrix can be interpreted as a measure of the roughness/smoothness of the image.

Estimation of Λ can be achieved through several approaches and here the simple relationship defined by Worsley et al. (1992) between the FWHM values in each of the principal directions and Λ is utilized. The derivatives in the covariance matrix of an image can be approximated numerically in each spatial dimension with differences between adjacent pixels are calculated as:

$$ \begin{aligned} Z_{xi} (x,y) & = \left\{ {T_{i} \left( {x + \delta x,y} \right) - T_{i} (x,y)} \right\}/\delta_{x} \\ Z_{yi} (x,y) & = \left\{ {T_{i} \left( {x,y + \delta y} \right) - T_{i} (x,y)} \right\}/\delta_{y} \\ \end{aligned} $$

where δ_x and δ_y are the dimensions of the image pixels in the x and y directions. The variances and covariances of the differences are then used to approximate the variances and covariances of the derivatives:

$$ \begin{array}{*{20}c} {V_{xx} = \sum\limits_{i,x,y,z} {Z_{xi} (x,y,z)^{2} /N(n - 1)} } \\ {V_{yy} = \sum\limits_{i,x,y,z} {Z_{yi} (x,y,z)^{2} /N(n - 1)} } \\ {V_{xy} = \sum\limits_{i,x,y,z} {\left\{ {Z_{xi} (x,y,z) + Z_{xi} \left( {x,y + \delta_{y} ,z} \right)} \right\}\left\{ {Z_{xi} (x,y,z) + Z_{xi} \left( {x + \delta_{x} ,y,z} \right)} \right\}/4N(n - 1)} } \\ \end{array} $$

These variance and covariance estimates are used to estimate Λ:

$$ \Lambda = \left[ {\begin{array}{*{20}c} {V_{xx} } & {V_{xy} } \\ {V_{xy} } & {V_{yy} } \\ \end{array} } \right] $$

Finally, the FWHM in the X and Y directions are calculated as:

$$ \begin{aligned} FWHM_{x} & = \sqrt {\frac{4\ln (2)}{{V_{xx} }}} \\ FWHM_{y} & = \sqrt {\frac{{4 { \ln }(2)}}{{V_{yy} }}} \\ \end{aligned} $$

15.2.2 Calculating the SPM

The Statistical Parametric Map is the difference image between individual pairs of images or average images, which is typically transformed from a map of t-statistics to a map of Gaussian Z-score values. The different methods used in this study for calculating the SPM are described in this section.

15.2.2.1 Conditional Differences

The t-test and t-statistic are used exclusively in this chapter for the conditional differences between two ensembles and a review of the t-statistic is provided in the Appendix. It is noted that other statistical tests and their resulting test-statistics, e.g., χ, Z, f, as well as measures of correlation can also be used as the basis of an SPM. For the t-tests employed here, a location (pixel)-specific calculation of the standard deviation is used. Another approach is to calculate the pooled standard deviation across the image (image-based) and arguments for using the image-based standard deviation are given by Worsley et al. (1992). In typical applications, the number of observations under each condition is small, near a dozen, and therefore the effective degrees of freedom for T(x, y) is generally small and needs to be used in the transformation of the t-field to a standard normal Gaussian Z-field.

The cumulative probability of a t-statistic is found from the t-distribution function with the appropriate degrees of freedom. This probability is then used with the inverse of the Gaussian distribution function to get the z-score value:

$$ \begin{aligned} & P(Y \le y) = T(y;v) \\ & z = G^{ - 1} \left( {P(Y \le y)} \right) \\ \end{aligned} $$

The resulting fields are now multiGaussian SPM’s and the anomaly detection algorithms developed for SPM analysis can be applied.

15.2.2.2 Isolated Regions of Activation

Anomaly detection here is focused on the number, size and location of regions within an SPM that is a curve/image/volume that exceed a given threshold level, u. These regions are known as “regions of activation”, “regions of exceedance” or “excursions”. The numbers, sizes and locations of these excursions are then compared against a reference model of the expected expression of such regions. Truncation of a Gaussian field at a threshold u defines the u-level excursion set:

$$ X_{u} = \left\{ {x \in R^{D} :Y(x) \ge u} \right\} $$

A large body of literature on the properties of excursion sets (regions of exceedence) in Gaussian random fields is available (e.g., Adler et al. 2009; Friston et al. 1994; Lantuejoul 2002). Friston et al. (1994) characterize three related properties of excursion sets in truncated Gaussian random fields:

the number of pixels above the truncation threshold, u,

the number of distinct regions (inclusions) above the threshold, and

the number of pixels in each region,

with expectation relationship E[N] = E[m]E[n]. For threshold value, u, the number of cells above that threshold, N, is provided by the Gaussian cdf and the size of the domain, S:

$$ {\text{E}}[{\text{N}}] = S\int_{u}^{\infty } {(2\pi )^{ - 1/2} e^{{ - z^{2} /2}} dz} $$

A measure of the number of isolated regions above the threshold can be obtained from the Euler Characteristic, EC. In two dimensions, the EC represents the number of connected excursion sets in the domain minus the total number of holes within those sets. Therefore, EC goes to 0.0 at u = 0 and EC becomes negative when u < 0.0 as the truncated field represents a single domain-spanning set containing a large number of holes. In 2D, and at relatively high truncation thresholds, EC is equivalent to the number of regions above the threshold, E[m].

$$ \text{E}[m] = EC = \left| {(2\pi )^{ - ((D - 1))/2} W^{ - D} Su^{(D - 1)} e^{{u^{2} /2}} } \right| $$

where D is the dimension of the domain and W is an alternative measure of the spatial correlation of the mG field defined as a fraction of the FWHM:

$$ {\text{W}} = {\text{FWHM}}/\sqrt {4 { \ln }(2)} $$

For a given threshold, u. the average area of the individual regions is found from the expectation relationship:

$$ {\text{E[n]}} = {\text{E[N]}}/{\text{E[m]}} = {\text{E[N]}}/ [ {\text{EC]}} $$

Figure 15.1 compares a direct calculation of EC on a multiGaussian field using the Matlab Image Processing toolbox (Matlab 2009) with estimates made using the Euler characteristic equation above across a range of u values increasing from left to right. Deviations between the calculated and estimated number of excursions indicate deviations from the definition of a multiGaussian field. The corresponding binary fields (500 × 500 cells) are also shown for several representative threshold values. Note, that typically the extreme ends of the graph corresponding to u values (truncation thresholds) with absolute values of 2.5 or greater are of interest.

15.2.3 Localized Anomaly Detection

Further analysis of the excursion sets is focused on the size and location of the detected anomalies. The excursion set maps themselves can be examined to determine the location of where the excursions are occurring. An extremely localized, yet very strong anomaly will be of interest. An anomaly with a much lower amplitude but greater spatial extent may also be of interest. The definition of spatial extent (size) of any anomaly is defined relative to the spatial correlation length of the field in which it is detected. The size of the anomaly is expressed through truncation of the field at a threshold value and defining the size of the excursion regions above that threshold.

In general, the significance of any anomaly in a spatial field is a function of its amplitude (intensity or strength) and its spatial extent (size). The observed SPM is compared against a specified multivariate spatial random field with a defined correlation length that serves as the model of the null hypothesis for the differences between two ensembles of spatial fields. Truncation of the observed SPM at a given threshold level creates regions of excursions above that threshold and the significance of the number and size of these excursions relative to the model of the null hypothesis is calculated. As in classical statistical hypothesis testing, the p-value defines the chance that the observed anomaly would occur under the null hypothesis. Here, the focus is on identifying the largest region of excursion for a specified threshold and calculating the chances of that anomaly occurring under the null hypothesis.

The pre-processing steps and the approach used for application of statistical parametric mapping to detection of significant excursion sets is outlined here and these steps are then applied to an example problem. The focus is on the approach used for calculation of the probability that one or more regions of activation of a certain area, or larger, could have occurred by chance under the constructed mG model. The full development of this approach for medical imaging is provided by Friston (1994) and Worsley et al. (1996). Additionally, Adler (2000) and Taylor and Adler (2003) provide further development of level crossing in random fields and the relationship to the Euler characteristic.

Steps:

(1)

Create an SPM through pixel by pixel application of 1-D (pixelwise) univariate statistical tests. The test statistic values resulting from this test at every point may be distributed as χ², t, F, or other and can be transformed into a Gaussian Z-score to create a Gaussian SPM.

(2)

Smooth the resulting SPM using a Gaussian kernel. The resulting SPM created in step 1 may be coarse and noisy. A small amount of smoothing using a Gaussian filter is enough to create a smoothed SPM.

(3)

Reinflate the variance. The smoothing process in Step 2 decreases the variance of the SPM and a reinflation process is used here to transform the SPM to a unit variance (1.0) for easier interpretation of results. Here, the empirical probability distribution function of the SPM after Step 2 is fit with a Gaussian Mixture Model (GMM) having three components. A quantile-preserving transform is used to transform the empirical cumulative distribution as modeled with the GMM to a cumulative Gaussian distribution. Use of the GMM better preserves the original shape of the distribution relative to simpler transforms such as the normal-score. No translation, or recentering, of the resulting Gaussian distribution is done.

(4)

Calculate the characteristics of the SPM, and choose an exceedance threshold to identify regions of exceedance.

Calculate the FWHM of the smoothed and transformed SPM created in Steps 1–3. The FWHM is derived from the variances and covariances of the spatial derivatives of the SPM. The resulting FWHM values are typically 5–15 times the size of the smoothing kernel used in Step 2.

Identify pixels that are above/below the ± threshold value.

Employ a flood-fill algorithm to determine the sizes of the separate regions of connected pixels, or regions of exceedance and label each region for both positive and negative excursions.

(5)

Apply a hypothesis test to determine the probability of a particular result having occurred under the null hypothesis of the mG model. Here, a test of the chance of obtaining the size of the of largest region of exceedance (excursion) under the null hypothesis of a Gaussian SPM with calculated FWHM is calculated. The significance of the maximum excursion size is calculated using the methods of Friston et al (1994):

The three main features of an SPM are: (1) the number of pixels, N, exceeding a threshold, u. Or the number of pixels in the excursion set; (2) the number, m, of regions above the threshold—the number of connected subsets of the excursion set; (3) the number, n, of pixels in each of the m subsets. The expectations of these three features are related as: E{N} = E{n} • E{m}.

Eq. 14 of Friston et al. (1994) gives the probability of at least one excursion region having a size ≥k pixels:

$$ \begin{aligned} P\left( {n_{\hbox{max} } \ge k} \right) & = \sum\limits_{i = 1}^{\infty } {P(m = i) \cdot \left[ {1 - P(n < k)^{i} } \right]} \\ & = 1 - e^{{ - E[m] \cdot P\left( {n \ge k} \right)}} \\ & = 1 - \exp ( - E[m] \cdot e^{{ - \beta k^{2/D} }} \\ \end{aligned} $$

where β = [Γ(D/2 +1) • E[m]/E[N]]^2/D and D is the dimension of the domain.

Calculations of P(n_max ≥ k) within the (k, u) parameter space for spatial fields with two different correlation lengths (FWHM) are shown in Fig. 15.2. The role of the correlation length of the null hypothesis model is clear from Fig. 15.2 where the probability of an excursion region of 60 pixels or more is approximately 0.001 for a field with a FWHM of 9.0, but is essentially zero (~ 10 × 10⁻¹²) for a field with a FWHM of 3.0.

15.3 Example Problems

Two example problems are used here to demonstrate the calculations and application of SPM to detecting anomalies in spatial random fields. Both example problems are two-dimensional, but the same approaches are applicable to anomaly detection in 1-D and 3-D domains.

15.3.1 Anomaly Detection in Images

A simple simulation study designed to mimic the detection of anomalous regions in either remote sensing or geophysical imagery is used here to test a few of the SPM calculations. The focus is on identifying the largest anomaly above a specified threshold and the significance of that anomaly.

A multiGaussian field is created through geostatistical simulation. The field is comprised of square, 5 × 5 m pixels, and has an isotropic Gaussian variogram with a range of 150 pixels. The field is created in standard normal space, N(0, 1) and the simulated values serve as the observed image. Measurement noise is added to the image by considering the simulated realization value, z(x) to be the mean value of a local Gaussian distribution at every pixel. The standard deviation of the Gaussian at every pixel, σ_z(x), is set to 2.0 and a Gaussian random deviate is drawn and added to z(x) to create the final image. This measurement noise is added independently at every pixel (i.i.d.) and then smoothed prior to adding to the observed image. The amount of spatial smoothing of the noise term is varied and the impact on anomaly detection is examined.

Anomalies are added to the observed image within a circular region having a radius of 90 pixels and centered at the center of the image. Background values within the anomaly region are multiplied by 1.5 creating stronger negative and positive values within the region depending on the sign of the original observed values. The area of the anomaly region is 5027 pixels.

Figure 15.3 shows background images (left column) at two levels of noise smoothing and the background images with the anomalies added (right column). As would occur in any image capture process, the noise values added to each image are drawn randomly and independently from any other image prior to smoothing. This creates subtle differences between the images in each row of Fig. 15.3 even without the addition of anomalies. Detection of the presence of the anomalies through visual comparison of the left and right images in each row of Fig. 15.3 is not obvious, even when the location of the anomaly is known.

The SPM’s are calculated through a pixelwise t-test for comparing two means (Appendix) between the image with and without the anomalies. These t-statistic maps are transformed to Gaussian Z maps that are the SPM (Fig. 15.4). The large anomalies in the center of the image are readily seen along with the dramatic changes in the results due to the increased spatial correlation of the noise component with increased smoothing. Additional SPM’s are created at intermediate levels of smoothing but are not shown here. Results from all levels of noise smoothing are shown in Table 15.1.

Table 15.1

Results of SPM analysis for four levels of noise smoothing

Bandwidth of smoothing filter (pixels)	1.5	3.5	5.5	7.5
FWHM	5.0	23.5	45.0	71.6
Size Max Positive (pixels)	2061	2886	1878	2075
P(n_max > k) Positive	<1.0 × 10⁻¹⁶	1.3 × 10⁻⁰⁶	0.237	0.405
Size Max Negative (pixels)	3640	3664	3772	3417
P(n_max > k) Negative	<1.0 × 10⁻¹⁶	1.5 × 10⁻⁰⁸	0.015	0.200
# regions >+ threshold	238	5	5	1
# regions <− threshold	212	5	2	1
Max SPM value (standard deviations)	8.28	6.45	5.25	4.55
Min SPM value (standard deviations)	−7.83	−6.23	−5.43	−4.58

A threshold of ±2.5 standard deviations is applied to the SPM’s and the excursion regions for the two extreme levels of noise smoothing are shown in Fig. 15.5. There are over 200 positive and 200 negative excursions for the smallest amount of noise smoothing and only 1 positive and 1 negative excursion at the largest amount of smoothing. The size of the excursions that are due to the added anomalies clearly stands out in the left image of Fig. 15.5. Table 15.1 also shows how the maximum and minimum images in the SPM decrease with increased levels of noise smoothing.

With increased smoothing of the noise, the FWHM of the image increases from 5.0 to ~ 72 pixels (Table 15.1). While the size of the largest positive and negative excursions remains approximately constant near 2000 and 3600 pixels, respectively, the p-value for excursions of that size occurring in the image changes dramatically. At the lowest level of smoothing, the chances of getting excursions of size 2061 or 3640 pixels under the Gaussian random field model with a FWHM of 5.0 are essentially zero (< 1.0 × 10⁻¹⁶). However, getting excursion regions of a similar size occurring under greater smoothing of the noise and a FWHM of 71.6 pixels is relatively common at 40 and 20%, respectively. These results demonstrate the strong dependence of P(n_max >= k) on the spatial correlation of the field.

15.3.2 Ground Water Pumping

A general problem in a number of geoscience disciplines is the case where an ensemble of inputs is used in a calculation to provide a probabilistic result to a particular question. The calculation can be relatively simple or complex, but acts as a transfer function to transfer uncertainty in spatially distributed physical properties to uncertainty in an outcome of interest. Examples include groundwater models transferring uncertainty in hydraulic conductivity and recharge into radionuclide transport times; reservoir simulators transferring uncertainty in permeability and porosity into estimated recoverable oil; and simple spatial integration to transfer uncertainty in soil nutrient levels into estimating total crop yield for an agricultural field.

Here, a ground water example problem is used with the SPM approach to detect significant differences between two groups of an ensemble of spatial random fields of transmissivity. The ensemble is split into groups that create high results and all others. The SPM approach is used here to identify statistically significant features within the ensemble of input fields responsible for the specific results. This approach can be considered identification of the significant features in the random fields responsible for a specific result of a process that integrates across the entire field.

15.3.2.1 Problem Setup

The ground water problem is motivated by the regulatory issue of impacts on a nearby wetland due to pumping from a planned water supply well. Well test criteria dictate that the pressure drop (drawdown) at a location 353 m to the northwest of the pumping well must be <2.00 m after pumping at a rate of 250 m³/h for 48 h. To simulate the aquifer test, a 12 × 12 km square domain, with zero-flux boundaries on the north and south and constant-head boundaries on the east and west is defined. Prior to pumping, the fixed head boundaries create steady state flow across the domain. A constant transmissivity, T, of 10.0 m²/h is assumed across the majority of the domain. This constant value is replaced by a heterogeneous T field within the center of the domain. The heterogeneous field is 3500 × 3500 m with 5 × 5 m cells. A large pumping well is set in the center of the domain.

The aquifer is confined in this area and the mean and spatial co-variance of the transmissivity can be estimated from other studies in aquifers of similar age and depositional history. The log10 values of transmissivity within the heterogeneous domain are simulated as a multiGaussian field with an isotropic Gaussian variogram with range 250 m and nugget of 5% of the sill. Transmissivity at the well location is considered known and provides the only conditioning point within the domain. A total of 200 realizations are created, and the 2D, confined, transient ground water flow equation is solved using finite differences on each realization:

$$ \frac{\partial h(x,y)}{\partial t} = \frac{1}{S(x,y)} \cdot \left( {\frac{\partial }{\partial x}T(x,y)\frac{\partial h}{\partial x}} \right) + \left( {\frac{\partial }{\partial y}T(x,y)\frac{\partial h}{\partial y}} \right)\,\pm\,Q(x,y) $$

where (x, y) indicates the spatial location, h (L) is the head (pressure), t is time and Q (L³/T) are sources or sinks—here the pumping rate at the well. Transmissivity, T (L²/T), is spatially heterogeneous within the central domain and for the calculations here, storativity, S (−) is set to a single value of 1.0 × 10⁻⁰⁵ across all locations in the aquifer. The initial conditions for the transient simulation are taken from a steady state head solution using the same input T field. Three example transmissivity realizations and maps of the resulting drawdowns after 48 h of pumping are shown in Fig. 15.6. Figure 15.6 demonstrates that the heterogeneous T field strongly impacts the resulting pressure response in a non-linear manner.

15.3.2.2 Results

For each ground water simulation, the drawdown at the test location (353 m NW of the pumping well) at 48 h is recorded and compared to the regulatory limit, R, of 2.00 m. The T realization is placed into one of two classes: those that meet the pressure drop limit, drawdown <= R, and those that exceed the limit. After 200 ground water simulations, the pixelwise mean and standard deviation within each class are calculated (Fig. 15.7). These four maps provide the input to a two-sample t-test to determine the difference between two means. The resulting map of t-statistics is the SPM. Here the t-statistics are smoothed with Gaussian kernel and transformed to Z-statistics and the Z-score SPM is shown in Fig. 15.8.

The SPM is calculated at every pixel as the mean T value of the fields that created drawdowns exceeding the regulatory threshold, R, minus those that resulted in drawdowns less than or equal to the threshold: T_>R − T_≤R. This convention creates a positive value in the SPM in an area where higher T values are associated with realizations that created exceedance of R and negative values where higher T values created drawdowns ≤R. Figure 15.8 shows regions of positive and negative values, but the dominant anomaly is a high SPM value between the pumping well and the observation point to the northwest. For this example, 155 realizations (77.5%) created drawdowns ≤ R and 45 (22.5%) created drawdowns that exceeded R.

The SPM is truncated at a threshold of ±2.5σ and the excursion regions are defined (Fig. 15.9). The size of the largest excursions and the probability of them occurring under the mG model are shown in Table 15.2. The SPM has a FWHM of 111.5 m (22.3 pixels). The large positive excursion between the pumping well and the monitoring point is significant with a p-value near 1.0 × 10⁻⁰⁴ while the largest negative excursion is not.

Table 15.2

Largest positive and negative excursions in ground water example SPM

FWHM (m)	111.5
Excursions	Positive	Negative
Largest excursion (pixels)	2369	713
P(n_max > k)	~1.0E⁻⁰⁴	0.198

Here the SPM approach also serves as a means of determining the regions of increased sensitivity of drawdown to the T values. As expected, when viewed from the perspective of influencing extreme drawdown values, the T values in the area between the pumping well and the monitoring point are significantly more important than other values in the T field. The remaining regions of excursion do not have any readily discernible connection to the ground water flow dynamics and are consistent with expected excursions in a mG field with this amount of correlation. In practice, the large positive excursion region in the SPM can be used to focus resources for additional data collection, e.g., geophysical survey and/or additional wells.

15.4 Summary

There is a large amount of work reported in the functional MRI literature on the detection of anomalies in spatially correlated fields using SPM. Apart from some work in astrophysics, this SPM work has generally been restricted to medical imaging. The body of knowledge around SPM and the statistical approaches developed for fMRI can be readily applied to problems in the earth and environmental sciences. This chapter reviews some of the major developments from the fMRI literature and demonstrates their application with an image anomaly detection problem and a ground water modelling problem. A strong advantage of SPM is that it directly addresses the challenge of enabling hypothesis testing, including calculation of the significance of the results, in spatially correlated fields.

The example problems chosen here emphasized defining the significance of the largest, positive and negative, anomaly in each SPM. The SPM framework also supports hypothesis testing on non-localized, “omnibus”, features such as the maximum/minimum value of the SPM, the number pixels exceeding the threshold and the number of excursion regions within the SPM. Additionally, hypothesis testing of localized, “focal”, features is also supported including hypothesis testing of the occurrence of any size excursion.

The example problems used here relied on the underlying images being realizations of mG fields, but that is not a requirement. It is the map of the test statistic values defining the differences between fields that is modelled as a mG field, and that flexibility makes SPM applicable to a very general set of problems as the mG model is a standard for differences between images. For example the same approach could be used to compare geologic models with discrete features. Future work will consider the application of other statistical tests within the SPM framework.

<SimplePara><Emphasis Type="Bold">Open Access</Emphasis> This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.</SimplePara> <SimplePara>The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.</SimplePara>

Vorheriges Kapitel Flint Drinking Water Crisis: A First Attempt to Model Geostatistically the Space-Time Distribution of Water Lead Levels

Nächstes Kapitel Water Chemistry: Are New Challenges Possible from CoDA (Compositional Data Analysis) Point of View?

Appendix: Conditional Differences

The t-test is a traditional measure of the difference between two means (e.g., Walpole and Myers 1989). Quite simply, the t-statistic is the difference between two values, at least one of which is a population or sample mean, normalized by the standard error of the mean:

$$ t = \frac{{\overline{X} - \mu }}{{s_{e} }} = \frac{{\overline{X} - \mu }}{{s\sqrt {1/n} }} $$

where $ \overline{X} $ is a sample mean, μ is a population mean, s_e is the standard error of the mean which is the standard deviation of the observations, s, that make up the data vector X multiplied by the square root of 1 over the number of samples within X. The cumulative probabilities for any value of t are available from the Student’s t distribution and require knowledge of the degrees of freedom, ν, within the test. For the analyses done here, ν is generally n − 1.

In the case of comparing two sample means to each other at each location, i.e., A(x, y) and B(x, y), instead of comparing a sample mean to a theoretical population mean, the value of s_e must be calculated from both sample sets as:

$$ S_{e} = S_{p} \sqrt {\frac{1}{{n_{1} }} + \frac{1}{{n_{2} }}} $$

where n₁ and n₂ are the number of images that were used in calculating the average maps A and B and s_p is the average pooled standard deviation:

$$ S_{p} (x,y) = \sqrt {\frac{{\left( {n_{1} - 1} \right)s_{1}^{2} (x,y) + \left( {n_{2} - 1} \right)s_{2}^{2} (x,y)}}{{n_{1} + n_{2} - 2}}} $$

Here we are assuming that n₁ and n₂ are constant for all locations and therefore not a function of (x, y). The t-statistic image (map), based on the pooled standard deviation, is:

$$ t(x,y) = \frac{{\Delta (x,y)}}{{S_{p} (x,y)\sqrt {\frac{1}{{n_{1} }} + \frac{1}{{n_{2} }}} }} $$

Adler RJ, Hasofer AM (1976) Level crossings for random fields. Ann Probab 4:1–12CrossRef

Adler RJ (1981) The geometry of random fields. Wiley, New York

Adler RJ (2000) On excursion sets, tube formulas and maxima of random fields. Ann Probab 10(1):1–74

Adler RJ, Taylor JE (2007) Random fields and geometry. Springer

Adler RJ, Taylor JE, Worsley KJ (2009) Applications of random fields and geometry: foundations and case studies. http://webee.technion.ac.il/people/adler/publications.html. Accessed Jan 2009

Anselin L (1995) Local indicators of spatial association—LISA. Geogr Anal 27(2):94–115

Brett M, Penny W, Kiebel S (2003) An introduction to random field theory, chapter 14. In: Frackowiak RSJ, Friston KJ, Frith C, Dolan R, Friston KJ, Price CJ, Zeki S, Ashburner J, Penny WD (eds) Human brain function, 2nd edn. Academic Press

Byers S, Raftery AE (1998) Nearest-neighbor clutter removal for estimating features in spatial point processes. J Am Stat Assoc 93:577–584CrossRef

Cressie N, Collins LB (2001) Patterns in spatial point locations: local indicators of spatial association in a minefield with clutter. Nav Res Logist. https://doi.org/10.1002/nav.1022

Friston KJ, Worsley KJ, Frackowiak RSJ, Mazziotta JC, Evans AC (1994) Assessing the significance of focal activations using their spatial extent. Hum Brain Mapp 1:210–220CrossRef

Friston KJ, Holmes AP, Worsley KJ, Poline J-P, Firth CD, Frackowiak RSJ (1995) Statistical parametric maps in functional imaging: a general linear approach. Hum Brain Mapp 2:189–210CrossRef

Gilbert RO (1987) Statistical methods for environmental pollution monitoring. Van Nostrand Reinhold, New York, p 320

Goovaerts P, Jacquez GM, Marcus A (2005) Geostatistical and local cluster analysis of high resolution hyperspectral imagery for detection of anomalies. Remote Sens Environ 95(3):351–367CrossRef

Goovaerts P (2009) Medical geography: a promising field of application for geostatistics. Math Geosci 41:243CrossRef

Lantuejoul C (2002) Geostatistical simulation: models and algorithms. Springer, Berlin, p 256CrossRef

Matlab (2009) Image processing toolbox. Mathworks Inc, Natick, MA, USA

McKenna SA, Ray J, van Bloemen Waanders B, Marzouk Y (2011) Truncated multigaussian fields and effective conductance of binary media. Adv Water Resour 34:617–626CrossRef

Stein DWJ, Beaven SG, Hoff LE, Winter EM, Schaum AP, Stocker AD (2002) Anomaly detection from hyperspectral imagery. IEEE Signal Process Mag 19(1):58–69CrossRef

Taylor JE, Adler RJ (2003) Euler characteristics for Gaussian fields on manifolds. Ann Probab 31(2):533–563CrossRef

Walpole RE, Myers RH (1989) Probability and statistics for engineers and scientists, 4th edn. MacMillan Publishing Co., New York, p 765

Worsley KJ, Evans AC, Marrett S, Neelin P (1992) A three-dimensional statistical analysis for rCBF activation studies in human brain. J Cereb Blood Flow Metab 12:900–918CrossRef

Worsley KJ, Marrett S, Neelin P, Vandal AC, Friston KJ, Evans AC (1996) A unified statistical approach for determining significant signals in images of cerebral activation. Hum Brain Mapp 4:58–73CrossRef

Titel: Statistical Parametric Mapping for Geoscience Applications
verfasst von: Sean A. McKenna
Verlag: Springer International Publishing
Buch: Handbook of Mathematical Geosciences
Print ISBN: 978-3-319-78998-9

Electronic ISBN: 978-3-319-78999-6

Copyright-Jahr: 2018
DOI: https://doi.org/10.1007/978-3-319-78999-6_15