Introduction
Resolving the main properties of the spatially variable hydraulic conductivity tensor
K is a key issue in groundwater modeling. When setting up a groundwater flow model, a model needs to be selected that considers appropriate initial and boundary conditions but also constitutes a suitable representation of the main geological features of the investigated aquifer. In fluvial gravel aquifers, this implies considering bedded subsurface features defined by the deposition of sediments of different size, geometry and sorting (Borghi et al.
2015, Bennett et al.
2019). Since the stratification of sediments affects the spatial distribution of hydraulic properties (Koltermann and Gorelick
1996; Heinz and Aigner
2003; Heinz et al.
2003), a major concern when conceptualizing the conductivity distribution in a groundwater model of a fluvial aquifer is to identify and characterize an appropriate number of layers with different properties. The differences between individual strata also cause the formation-averaged hydraulic conductivity tensor
K to be anisotropic because groundwater preferably flows in the direction of the layers rather than perpendicular to it (Bear
1972; Borghi et al.
2015). This implies that the principal directions of the effective hydraulic conductivity tensor
Keff are typically the horizontal and vertical directions which are aligned with the strata. In this study, the ratio of horizontal to vertical conductivity, when averaging over the horizontal layers, is denoted the anisotropy ratio
ϑ. The hydraulic anisotropy is of importance whenever the vertical flow component is significant, for instance in flow close to partially penetrating wells, horizontal collector wells, or around objects that partially penetrate aquifers, or when considering river–groundwater exchange. While regional flow is predominantly horizontal, these specific boundary conditions induce a vertical-flow component that can be crucial in the overall design of groundwater management measures.
Many hydrogeological applications such as the design of remediation systems, depend not only on precise information on subsurface heterogeneity (e.g. Cardiff and Barrash
2011; Zschornack et al.
2013), but also require information on hydraulic anisotropy (e.g. Bair and Lahm
1996; Zlotnik and Ledder
1996). A specific example in which hydraulic anisotropy is relevant is the delineation of capture zones of partially penetrating wells (Bair and Lahm
1996). Several experimental methods have been developed for resolving the spatial variability of hydraulic conductivity at different scales and degrees of resolution. Hydraulic tomography, for example, is a common method that helps to obtain larger-scale (>10 m) three-dimensional (3D) information on hydraulic-conductivity variations (Gottlieb and Dietrich
1995; Yeh and Liu
2000; Bohling
2009; Hochstetler et al.
2016; Sanchez-Leon et al.
2016). Direct-push methods such as direct-push injection logging (Bohling et al.
2002; Butler et al.
2007; Dietrich et al.
2008; Lessoff et al.
2010) or the direct-push permeameter (Butler et al.
2007; Chen et al.
2008,
2010; Klammler et al.
2011; Zschornack et al.
2013) resolve apparent horizontal hydraulic conductivity with depth and are thus well suited to investigate local hydraulic-conductivity variations in the vertical direction at high resolution.
To resolve the ratio of horizontal to vertical hydraulic conductivity, different field methods have been investigated. Klammler et al. (
2017) proposed a shape factor to estimate the bulk hydraulic anisotropy from measurements obtained with the direct-push permeameter but without considering the vertical variability of hydraulic conductivity. The tomographic slug test proposed and tested by Paradis et al. (
2015,
2016) in a littoral aquifer seems to be more appropriate for resolving hydraulic anisotropy induced by heterogeneities at smaller scales, also in the horizontal direction, e.g., from cross-bedding. A specific limitation of the tomographic slug test, however, is the very small range of investigation in the horizontal direction (typically <10 m), especially in highly permeable aquifers (Paradis et al.
2015). Even though different studies have dealt with the investigation of anisotropic conductivity, a suitable field method for estimating the ratio of horizontal to vertical hydraulic conductivity on larger scales in fluvial gravel aquifers has not yet been tested. The aim of this study is to examine the viability and benefits of a method for resolving hydraulic anisotropy on larger scales induced by the vertical heterogeneity on smaller scales.
The present study builds upon a method introduced by Maier et al. (
2020) to estimate hydraulic anisotropy by inverting steady-shape aquifer tests using a partially penetrating pumping well. The approach follows the basic principles of hydraulic tomography. That is, a series of pumping tests is performed, in which groundwater from different intervals of a single pumping well are sequentially extracted, and the hydraulic response is observed in surrounding observation wells, placed at different distances and depths. In contrast to the steady-state pumping regime, the absolute drawdowns are still changing in the steady-shape pumping regime, but the hydraulic-head differences between observation locations remain constant (Bohling et al.
2002, Bohling et al.
2007).
In this work, the method described by Maier et al. (
2020) is modified and applied to a fluvial gravel aquifer located in the Upper Rhine Valley at the Germany-France border. Specifically, the question of how a homogeneous anisotropic groundwater flow model performs in comparison to models with several anisotropic layers is addressed, as well as how these layers should be defined.
While Maier et al. (
2020) described the application of steady-shape aquifer tests to jointly optimize modeling and measurement strategies with a synthetic scenario, the present study considers a field application, including the design of the experiments.
This paper starts with a brief repetition of the underlying theory, followed by a description of the field application. Then the numerical models are described, and the principles used in model calibration are outlined. After presenting the site-specific results, the paper finishes with discussing the main findings and giving general recommendations.
Model calibration
All three models were independently calibrated by the Trust-Region Reflective Least-Squares method of the function lsqnonlin in the optimization toolbox of MATLAB (Coleman and Li
1996). To reduce the large data volume in model calibration, the averaged drawdown measurements
smeas from all three hydraulic tests were jointly considered in the calibration, leading to
nmeas = 3 × 57 = 171 drawdown observations.
As mentioned before, a steady-shape pumping regime was considered in the simulations, in which drawdown differences between observation locations remain constant. Typically, this requires the specification of pairs of observation points by either setting one observation location as the superordinate reference point (Maier et al.
2020) or by considering all feasible pairs of observation points (Bohling et al.
2002). Each field measurement, however, is subject to measurement errors of different types, including measurement noise or the misplacement of observation wells (Maier et al.
2020). In trials not reported here, the effect of considering different observation points as the reference point had been tested, yielding different model-calibration results due to measurement error. To avoid the propagation of uncertainties in the generation of pairs of observation points, the model calibration includes a virtual reference point. That is, for each hydraulic test, the simulated drawdown difference
ssim = |
st −
sref| contained the simulated steady-state drawdown
st and the drawdown at a virtual reference point
sref, which is identical among all measurement points but needs to be estimated together with the hydraulic-conductivity values.
Then the differences between the simulated and measured drawdowns ssim and smeas were computed and normalized by the error σi of each measurement i, which is defined by an error model discussed below.
In the calibration, the objective function
φ to be minimized is defined as the sum of squared normalized residuals:
$$ \varphi =\sum \limits_{i=1}^{n_{\mathrm{meas}}}{\left(\frac{s_{\mathrm{sim},i}\left(\mathbf{p}\right)-{s}_{\mathrm{meas},i}}{\sigma_i}\right)}^2 $$
(9)
in which
p is the parameter vector including the logarithms of
Kr and
Kz of all horizontal layers considered and the reference drawdown
sref for each of the three hydraulic tests. Thus, in total, the 1-, 3- and 5-layer models include
npar = 3,
npar = 7, and
npar = 11 calibration parameters, respectively.
The error model accounts for the combined effects of the reproducibility error, a potential measurement bias (e.g., due to misplacement of the observation points), and most importantly the model-conceptual error (e.g., due to suboptimal definition of layers or lacking 3-D heterogeneity). In essence, none of the defined models are claimed to be perfect representations of reality so that misfits that are bigger than the error of the measurements themselves are accepted for the sake of keeping the hydrogeological models comparably simple and the fitted parameters meaningful. In this framework, a heteroscedastic error model is needed that has a set of parameters that become part of the fitting procedure. As different models have different deficiencies, they have different model errors, and judging the quality of the different models is based on the fitted coefficients of the error model. After testing different error models, which for the sake of brevity are not presented here, the following parameterization appeared to represent the behavior of the residuals reasonably well:
$$ \sigma =a+\frac{b\bullet {s}_{\mathrm{meas}}^2}{c+{s}_{\mathrm{meas}}} $$
(10)
with
a,
b and
c being the error-model parameters. This specific error model starts off with a constant error with
\( \underset{s_{\mathrm{meas}}\to 0}{\lim}\sigma =a \) corresponding to the absolute error, then shows a quadratic increase with the measurement, and converges to a linear dependence on
smeas for large values with
\( \underset{s_{\mathrm{meas}}\to \infty }{\lim}\frac{\sigma }{s_{\mathrm{meas}}}=b \) corresponding to the relative error. The parameter
c quantifies how quickly the error model converges from the measurement-independent to the linear regime.
The error model parameters are determined by calibrating the 1-, 3-, and 5-layer models according to the expectation-maximization method (Dempster et al.
1977). The scheme involves iteratively minimizing the objective function with the Trust-Region Reflective Least-Squares method of the function lsqnonlin in the optimization toolbox of MATLAB (Coleman and Li
1996) with given coefficients of the error model and updating the error-model parameters by performing a least-squares fit of the error model to the absolute residuals |
ssim(
p) −
smeas| of the model fit to the measured drawdown
smeas. With this, the error-model parameters
a,
b and
c, as well as all model parameters, are included in the optimization process. The iterative calibration procedure is completed when the change in all model and error parameters is less than 1%. The comparison of the different models is now not based on meeting the observations within the measurement error but on the magnitude of the model error needed to accept the different models. In the following, the goodness of fit is assessed by comparing the resulting absolute and relative errors between the 1-, 3-, and 5-layer models.
After fitting the models, the associated standard deviation
\( {\hat{\sigma}}_{p_i} \) of estimation of the model parameter
i are first computed by linearized error propagation:
$$ {\hat{\sigma}}_{p_i}=\sqrt{{\mathbf{C}}_{\mathbf{pp}}\left(i,i\right)} $$
(11)
with the parameter covariance matrix
Cpp computed by:
$$ {\mathbf{C}}_{\mathbf{pp}}=\frac{\varphi }{n_{\mathrm{meas}}-{n}_{\mathrm{par}}}{\left({\mathbf{J}}^T{\boldsymbol{\Sigma}}^{-\mathbf{1}}\mathbf{J}\right)}^{-1} $$
(12)
in which the Jacobian
J contains the partial derivatives of all simulated measurements with respect to all parameters, and
Σ is the diagonal matrix of the squared errors according to the error model. Because the parameters
a and
b of the error model are bigger if the model shows larger misfits, the resulting parameter standard deviations of estimation are also bigger.
To address nonlinearity, the uncertainty estimate of the model parameters is refined by applying a Markov-Chain Monte Carlo (MCMC) method for the hydraulic-conductivity values with Metropolis-Hastings sampling, starting with the best estimate of the preceding optimization and keeping the coefficients of the error model as well as the reference drawdown values
sref fixed. This leads to a sample of 1,000 parameter realizations for each model, drawn from the posterior distribution. The results of the MCMC sampling are given in section S6 of the
ESM.
Finally, the radial and vertical conductivities
Kr and
Kz are upscaled to the full aquifer thickness, resulting in the effective radial and vertical conductivities
\( {K}_{\mathrm{r}}^{\mathrm{eff}} \) and
\( {K}_{\mathrm{z}}^{\mathrm{eff}} \), defined as the arithmetic and harmonic means of layer-specific values, respectively:
$$ {K}_{\mathrm{r}}^{\mathrm{eff}}=\frac{1}{z_{\mathrm{top}}-{z}_{\mathrm{bot}}}{\int}_{z_{\mathrm{bot}}}^{z_{\mathrm{top}}}{K}_{\mathrm{r}}\left(\zeta \right) d\zeta $$
(13)
$$ {K}_{\mathrm{z}}^{\mathrm{eff}}=\kern0.5em \left({z}_{\mathrm{top}}-{z}_{\mathrm{bot}}\right){\left({\int}_{z_{\mathrm{bot}}}^{z_{\mathrm{top}}}\frac{1}{K_{\mathrm{z}}\left(\zeta \right)} d\zeta \right)}^{-1} $$
(14)
From this, the anisotropy ratio
ϑ is calculated by:
$$ \vartheta =\frac{K_{\mathrm{r}}^{\mathrm{eff}}}{K_{\mathrm{z}}^{\mathrm{eff}}} $$
(15)
The calculation of \( {K}_{\mathrm{r}}^{\mathrm{eff}} \), \( {K}_{\mathrm{z}}^{\mathrm{eff}} \), and ϑ is also performed for each realization of the MCMC ensemble, resulting in distributions of these quantities.
Conclusions
This work has tested an approach for estimating the hydraulic anisotropy induced by vertical heterogeneity in stratified aquifers. The approach is based on calibrating groundwater flow models using data of sequential hydraulic tests with partially penetrating wells, in which water is extracted from different aquifer depths and the hydraulic response is measured at different radial and vertical distances to an extraction screen (Maier et al.
2020). Pumping-test series with three extraction depths were performed in a fluvial gravel aquifer in South-West Germany, measuring more than 1,000 transient drawdown responses with a monitoring network of 58 observation points. These data were used to fit an anisotropic homogeneous model as well as locally anisotropic 3-layer and 5-layer models. The main target parameters were the radial and vertical hydraulic conductivities of each horizontal layer. The 3- and 5-layer models could reproduce the observed drawdown measurements considerably better than the 1-layer model, particularly because one of the three test series showed larger drawdown values, which could be attributed to pumping from a less permeable layer in the multilevel models, whereas the uniform model showed a systematic bias.
Based on the presented investigations, the following general recommendations for the design and analysis of pumping tests targeting hydraulic anisotropy are proposed:
1.
The key element of the pumping tests is to extract water from a partially penetrating well, which induces a strong vertical flow component (at least in the vicinity of the pumping well), which is required to resolve the directional dependence of hydraulic conductivity in stratified aquifers.
2.
The development of the pumping well considered in this study follows the development of an extraction well used for dewatering measures in a large construction pit. Performing the pumping tests is not restricted to such a large-diameter well or to the screen lengths of the well considered in the present study. The well diameter should be dimensioned based on the objective to induce a sufficiently large cone of depression which at the same time is within a measurable signal range.
3.
Stressing the aquifer by extraction in different depths is mandatory. If water had been extracted only from a single depth (e.g., using the bottom well screen), the general vertical profile of hydraulic conductivity would most likely not have been detected.
4.
Checking the reproducibility of the performed pumping tests by repeating the tests with different pumping rates and then rescaling the results to a common rate is highly recommended. Averaging over the repetitive tests has reduced the large data volume.
5.
To avoid the challenges related to analyzing transient data or of reaching steady-state drawdown in field applications, the steady-shape analysis (Bohling et al.
2002; Bohling et al.
2007) is advantageous. Implementing a virtual reference point to compute drawdown differences is a reasonable alternative to the computation of drawdown differences based on pairs of true observation points, for which inherent measurement errors are propagated.
6.
If sufficient data are available, it is preferable to resolve the main vertical structure of hydraulic conductivity over fitting a uniform effective conductivity tensor. A better agreement between simulated and measured drawdowns, avoiding systematic bias, was achieved with the multilayer models than with the single-layer model. Upon upscaling, the anisotropy ratio resulting from the multilayer model was considerably larger. Also, identifying layers of preferential flow may be important both in solute-transport applications and in flow applications in which the vertical flow component occurs mainly in a specific depth, as in the dewatering scenario considered in the authors’ preceding theoretical study (Maier et al.
2020).
7.
Selecting the right number and vertical positions of multiple layers is a challenge and may be prone to confirmation bias. As Zhao and Illman (
2018) have illustrated, the use of information from prior hydrogeological investigations benefits model calibration. In this study, available lithologic information from the drilling profile of the pumping well proved to be a plausible decision guide for narrowing down potential layers, but in hindsight one layer per extraction screen turned out to be sufficient. Most likely, performing several flowmeter or direct-push injection-logging tests to see whether consistent layers of higher or lower conductivities exist across several vertical profiles would have been better for delineating hydraulically relevant layers than the grain-size data used here.
8.
The true hydraulic conductivity in an aquifer will always be a spatially variable full 3 × 3 tensor. On the scale of pumping tests, however, horizontal variability is often smaller than the differences among the vertical layers. To justify the assumption of radial symmetry (neglecting horizontal heterogeneity and/or anisotropy), it was important to install observation wells in several directions from the pumping well.
Overall, the study has demonstrated the applicability of the proposed approach targeting the vertical variability and anisotropy of potentially stratified aquifers. Of course, the experimental effort of installing a multisection partially penetrating well and multilevel observation wells is considerably higher than the effort associated with fully-screened wells. This extra effort may only be justified in applications in which either significant vertical flow is to be expected such as in riverbank-filtration setups or in the design of horizontal collector wells, or when the identification of preferential-flow layers is crucial, like in solute-transport applications.
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.