Elsevier

Geoderma

Volume 143, Issues 1–2, 15 January 2008, Pages 123-132
Geoderma

Inferences from fluctuations in the local variogram about the assumption of stationarity in the variance

https://doi.org/10.1016/j.geoderma.2007.10.021Get rights and content

Abstract

Geostatistics is commonly used to describe and predict the variation of soil properties over the landscape. However, many geostatistical methods require the assumption that our observed data are a realization of a random function which is intrinsically stationarity. Under stationarity, observations of a single realization of the random function at different positions can be treated as a form of replication. There are various ways in which a random function may breach the assumption of intrinsic stationarity and numerous geostatistical techniques have been developed that are able to cope with some forms of non-stationarity. What is currently needed is a set of diagnostic tools capable of detecting and identifying when data cannot plausibly be treated as a realization of a process which is stationary in the variance.

In this paper, we propose an inferential method that can identify when stationarity in the variance cannot plausibly be assumed. The basis of our approach is to obtain a model for the random function under the assumption of intrinsic stationarity. If the global dataset can be regarded as a realization of a Gaussian process (perhaps after transformation), then the global variogram is sufficient for this purpose. By using a window-based method to locally estimate variograms, we can define some statistic of homogeneity of the sample variation of the data. This allows us to obtain a sample distribution for this statistic, under the null hypothesis of intrinsic stationarity, by generating multiple realizations of the postulation random function at the original sample points using Monte Carlo methods and recomputing the statistic for each realization. We selected as statistics the interquartile ranges of: i) the spatial dependence ratio (s), the proportion c1 / (c0 + c1), ii) a distance parameter (a), which is the maximum lag over which the random function is autocorrelated for variograms like the spherical, and iii) the local variances (v; c0 + c1), where (c0) is the nugget component and (c1) the spatially structured component. We demonstrated this method using data from the large scale sampling (n = 1341 over 8248 km2) of the Florida Everglades, United States.

Introduction

Soil-forming processes are very complex and our understanding of this complexity is imperfect. An effective way to treat this uncertainty is to model soil properties as realizations of a random function (Webster, 2000). Geostatistics generates models of these random functions, which are then used to describe variation in soil properties at different spatial resolutions and can be used to predict them at unsampled locations. This approach has been successfully applied to a wide range of soil properties, including soil metals (Goovaerts and Webster, 1994) and the composition of soil microbial communities (Brockman and Murray, 1997, Franklin and Mills, 2003) and various other soil properties (Grunwald, 2006). Predictions from geostatistical models can often form the underlying data used in precision farming (Sadler et al., 1998) or in environmental fate and process models (Corwin et al., 1997). The geostatistical model of spatial variation also underlies linear mixed models for spatial data (e.g. Lark and Cullis, 2004).

The principal underlying assumption of geostatistics is that the stochastic process is stationary. Consider a set of N sampling locations along a line (transect), X = [x1,…, xN], then the data obtained at these locations, z(x1),…, z(xN), are samples from the marginal distributions, D1,…, DN, which are projections of the N-variate distribution D of the random function Z(x). We have only one sample from each marginal distribution DN, resulting in one vector z(X) = z(x1),…, z(xN), drawn from D, which is insufficient to estimate the parameters of D. In geostatistics we make inferences about the random function Z(x) by invoking the assumption of stationarity. Under stationarity, the multiple observations z(x1),…, z(xN) provide a kind of replication, i.e., any pair of observations z(xi), z(xi + h) are drawn from a bivariate process with the same distribution. This assumption allows us to infer information about D and ultimately model the random function Z(x).

The assumption of stationarity in its most general sense implies that the joint distribution of the random function at all locations does not depend on the absolute geographic location of the point samples, i.e. the models of spatially dependent variance (covariance structures) are the same over the entire sampled area. Formally, when a random function is strictly stationary, the joint distribution function at a set of N sample points, x1,…, xN, is invariant when the origin of x1,…, xN is translated. As we discuss below, the strict assumption of stationarity is not necessary in geostatistics, but a minimal assumption of stationarity in the mean (zero) and variance of the increments z(xi)  z(xi + h) is required.

The less serious breach of this assumption is a non-stationary mean, which can effectively be dealt with by using generalized covariances (Matheron, 1973) or the empirical best linear unbiased predictor with appropriate fixed effects (e.g., universal kriging; Meul and Van Meirvenne, 2003). A more serious violation of this assumption is when the variance is non-stationary and this is the focus of this paper. Non-stationary variance can take two forms, the variance may change as a function of space, i.e. the local variance of a soil property changes at different locations. A second form of non-stationary variance occurs when the distribution of the variance between scales (or spatial frequencies) changes in space. These changes in the scale-dependent distribution of variance imply a change in the autocorrelation function. Furthermore, variability of soil processes may well be non-stationary as a consequence of both changes in the variance and in the autocorrelation function.

It is easy to conceive of spatial processes in which it is not plausible to assume that the underlying covariance structures are stationary over the scales of interest. For example, pollution dispersion models in soils depend on certain characteristic soil properties such as texture, which can govern contaminant sorption or movement in the soil media. If empirical spatial covariances or empirical variograms are used to estimate texture across a landscape characterised by alluvial and glacial deposits as well as areas where erosion is prevalent, the assumption of an underlying random function which is stationary in the variance is implausible because of changes in local variation of texture. Consider measurements of soil carbon content on a transect that covers different parent materials, land use and vegetation. The characteristic spatial scales of variation in soil carbon content of clay soils under forestry may be largely determined by the structure of the forest canopy and management units for silvicultural production. If the transect also passes through a rapidly changing ecotone, from forest to wetland to grassland, then the pattern of variation in SOC may be of a larger range than in the woodland in this region of the transect, but with an important short-range pattern in the wetland, reflecting carbon ‘hot spots’ due to microtopographic variation. If we compute a variogram for the whole transect, we assume, implausibly, that all this variation can be treated as a realization of a single stationary random function. In fact, the variogram does not represent the variation in SOC anywhere in the forest because the variability changes so markedly. Other examples where stationarity might be implausible are given by Lark (2006) and Sampson and Guttorp (1992).

The simplest approach to address changing variances is to partition the sampling region into segments within which stationarity is a more reasonable assumption (McBratney et al., 1991). There are also more complex techniques that can manage both forms of non-stationary variances, e.g. deformation of the original rectilinear coordinate space to obtain an alternative space in which stationarity is plausible (Sampson and Guttorp, 1992).

Currently the practitioner must infer where stationarity cannot plausibly be assumed by either observing directional variogram behaviour (possibly after removing a trend), by comparing variograms from portions of the sampling area or based on some underlying knowledge about the variable and processes being modelled. This could be facilitated by methods to test the plausibility of an underlying stationarity random function, and to identify how this assumption appears to fail. Lark (2007) showed that it is possible to test the assumption of stationarity in variance using the discrete wavelet transform and Fuentes (2005) derived a spectral based test for stationarity. However, the wavelet tests require that the data is obtained from a uniform sampling on a square grid or transect, which is not the most efficient approach to estimating the variogram.

We probably cannot assume that many soil properties are a realization of a stationary random function because the variance and/or the autocorrelation function appear to change. It is also the case that geostatistical methods that rely on the assumption of stationarity are increasingly being used in soil science and that methods exist that can alleviate some of the effects of non-stationarity. However, there are currently few approaches to test for non-stationarity of spatial processes. In this paper, we propose an inferential method to test the plausibility of assumptions of stationarity in the variance given that the underlying model is a random function. The method tests for changes in the variance and autocorrelations.

Section snippets

Theory

Given the random function Z(x), strong stationarity occurs when at any finite number of points x1,…, xN, the joint distribution of Z(x1),…, Z(xN) is the same as that of Z(x1 + h),…, Z(xN + h) where h is some displacement. A weaker form of stationarity is the assumption that a covariance function, Cov[Z (x + h), Z(x)] exists and depends only on h (equivalent to stationarity of the mean and variance only). This form of stationarity is known as second order stationarity. A still weaker form of the

Case study

We now illustrate the inference approach on four soil properties from a large dataset at 1341 sites for a large area in Southern Florida encompassing the Everglades National Park, Big Cypress National Preserve in the south and west and Water Conservation Areas (WCA) 1, 2, and 3 in the north (Fig. 1). The entire extent of this area is 8248 km2. The soils in the northern areas are predominantly peat, the limestone bedrock tends to be more influential on the soils towards the southern areas as the

Discussion

The method proposed in this paper computes local estimates of the variogram parameters, within a moving window, and compares their variability with that of the corresponding variograms for a realization of a stationary random function with the same global variogram as the real data. To our knowledge, it is one of a few approaches currently available to determine whether the assumption of intrinsic stationarity of the variance is appropriate for a given set of data. The disadvantages of using

Acknowledgements

The authors would like to thank Dr Ramesh Reddy and the Wetland Biogeochemistry Laboratory for providing us with the data. This work was funded by the Biotechnology and Biological Sciences Research Council under grant BB/C506813/1, and through its core grant to Rothamsted Research.

References (30)

  • DeutschC.V. et al.

    GSLIP: Geostatistical Software Library and User's Guide

    (1992)
  • DobermannA. et al.

    Geostatistical integration of yield monitor data and remote sensing improves yield maps

    Agron. J.

    (2004)
  • GoovaertsP. et al.

    Scale-dependent correlation between topsoil copper and cobalt concentrations in Scotland

    Eur. J. Soil Sci.

    (1994)
  • HaasT.M.

    Lognormal and moving window methods of estimating acid deposition

    J. Am. Stat. Assoc.

    (1990)
  • Cited by (32)

    • Operationalizing the ecosystems approach: Assessing the environmental impact of major infrastructure development

      2017, Ecological Indicators
      Citation Excerpt :

      The differences between the magnitudes of ESR calculated with the two approaches can be explained by the non-linear character of the ESR values derived on the grid-by-grid cell basis, and the fact that this non-linearity was reduced by upscaling the modelled values to the entire study area in the ‘scale first then calculate’ approach before calculation of the ESR. Non-linearity is common for natural phenomena (Corstanje et al., 2008; Corstanje and Lark, 2008), and may cause particularly large discrepancies in the ESR values calculated with the two differing approaches for ecosystems services showing high heterogeneity of modelled biophysical indicator values across space (water yield, and nitrogen, phosphorus and sediment retention). As seen in our example, these discrepancies may be substantial, and not only drive the magnitude of MESSI, but also may lead to contrasting conclusions with regards to changes in ecosystem services supply due to land use change such as in the case of water yield and nutrient retention services presented here.

    • On the application of Bayesian Networks in Digital Soil Mapping

      2015, Geoderma
      Citation Excerpt :

      As is the case with most landscape level stochastic models, there is an assumption of stationarity of the conditional dependencies in BNs. This study does not consider the implications of deviation from stationarity only to note that it can be determined (e.g. Corstanje et al., 2008) and there are methods which will allow for non-stationary behaviour, such Dynamic Bayesian Networks (Robinson and Hartemink, 2010). Non-stationary behaviour in BNs for DSM is a further consideration for future studies.

    View all citing articles on Scopus
    View full text