Optimal estuarine sediment monitoring network design with simulated annealing

https://doi.org/10.1016/j.jenvman.2005.04.024Get rights and content

Abstract

An objective function based on geostatistical variance reduction, constrained to the reproduction of the probability distribution functions of selected physical and chemical sediment variables, is applied to the selection of the best set of compliance monitoring stations in the Sado river estuary in Portugal. These stations were to be selected from a large set of sampling stations from a prior field campaign. Simulated annealing was chosen to solve the optimisation function model. Both the combinatorial problem structure and the resulting candidate sediment monitoring networks are discussed, and the optimal dimension and spatial distribution are proposed. An optimal network of sixty stations was obtained from an original 153-station sampling campaign.

Introduction

A well designed, ongoing monitoring program is fundamental for the evaluation of environmental management of natural systems (Kay and Alder, 2000). The design of an effective monitoring program depends on the management objectives, resources (funding and staff) and available technology. Monitoring programmes should be designed to contribute to a synthesis of information or to evaluate impacts, or analyse the complex cross-linkages between environmental quality aspects, impacts and socio-economic driving forces (RIVM, 1994).

The technical design of monitoring networks is related to the determination of: (i) monitoring sites; (ii) monitoring frequencies; (iii) variables to be sampled; (iv) duration of sampling (the last two variables are not discussed here because they are case-specific). Most of the research results in this area have been obtained in the context of statistical procedures (Sanders et al., 1983; Moss, 1986, IAHS, 1986; Cochran, 1977). These rely in the principle that there are several sources of uncertainty, due to measuring errors, inherent heterogeneities of the involved variables, and in the cases where modelling is involved, also simplifications and errors in both the modelling and numerical/analysis solution phase. McBratney et al., 1981), as well as many other authors after them, indicated that uncertainties are the result of lack, in quality and quantity, of information concerning the systems under study, or as a result of spatial and temporal variations of parameters.

In many monitoring programs a first sampling stage with a large number of locations is undertaken, either because there is no prior information or it is considered necessary to collect more data. This stage is usually planned to give statistical information about the variables under study and to calculate their spatial covariance. A second stage is needed to transform the original set of sampling stations, with high cardinality, into a lower cardinality set of monitoring stations. Probably the methods used most to reduce cardinality are those based on the maximisation of spatial accuracy, or in other words, on the minimisation of the variance of the estimation error, also known as variance reduction methods. This is usually carried out in the context of geostatistical theory (Matheron, 1963, Matheron, 1965) and most frequently by interpolation with an unknown mean, i.e. by ordinary kriging. Other promising methods have been proposed for optimising the monitoring network design, in particular those based on information theory, as in articles such as those by Amorocho and Espildora, 1973, Caselton and Husain, 1980, Caselton and Zidek, 1984, Harmancioglu and Yevjevich, 1987, Husain, 1989, Harmancioglu and Alspaslan, 1992. Despite the elegance of these methods, they are limited by the need to assume a probability distribution for the variables, which may be unknown or difficult to determine. Moreover the method is particularly well adapted to variables with equal probability distributions (usually normal or lognormal). When soft and other sources of information are available then the Bayesian Maximum Entropy geostatistical method, first developed by George Christakos (Christakos, 1990, Christakos, 1992), have proven to outperform ordinary kriging (D'Or et al., 2001), and also have the advantage over the latter that they do not require the specification of particular probability distributions.

Kriging variance has been extensively used for monitoring network design. Examples can be found in the work of Bras and Rodríguez-Iturbe, 1976, Rouhani, 1985, Loaiciga, 1989, Rouhani and Hall, 1988, Pardo-Igúzquiza, 1998, van Groenigen et al., 1999, van Groenigen and Stein, 1998, and Nunes et al., 2004a, Nunes et al., 2004b.

Two categories for monitoring optimisation with variance reduction have been proposed: (i) the local approach (e.g. Amorocho and Espildora, 1973); and (ii) the global approach (e.g. Ahmed et al., 1988). In the first the influence of each additional point is analysed separately. Total variance reduction after adding one point is easily computed by considering the individual values at each initial location or at the points in the vicinity of the point being estimated. In the global approach average estimation variances are used. Therefore, global approaches provide only average answers to monitoring designs. It is useful to analyse designs still on the drawing board or to perform extensive redesigns aimed at maintaining the efficiency of a monitoring network, which may require removal of poorly located sites. The local approach, on the other hand, is better suited to optimally expanding an existing network. The optimality in this case only relates to the additional points, which may not be acceptable if the original points are not optimal (Markus et al., 1999).

Minimisation of the average kriging variance approach was applied here to select the number and positions of sediment monitoring stations in the Sado river estuary located in the southwest coast oft Portugal (Fig. 1), such that different physically and chemically homogeneous areas identified in a prior sampling campaign were considered. This monitoring network will be further integrated into an environmental data management system for the Sado Estuary as a decision support tool for local authorities. The Sado Estuary in Portugal is an example where environmental problems are not well managed owing to the high natural values and diverse pressures for development and where the right tools to help evaluating the environmental quality status need to be developed. The objective here was on the development of a monitoring network that constitutes one the information sources of the Sado Estuary management system (physic-chemical data of sediment quality).

For practical and budgetary reasons the number of monitoring stations should be reduced to a minimum. The optimisation problem can be stated in a very simple way: maximising the spatial accuracy, constrained to a maximum number of stations, given the information collected in a prior sampling program (153 sampling sites). Maximisation of spatial accuracy is easily attained by minimising the variance of estimation error, though incorporating the patchiness of homogeneous areas is a more difficult problem. One alternative would be to fix several locations inside the different homogeneous areas, but then the choice of stations would be arbitrary. Another way is to use stratification, considering that a defined number of stations must be placed inside homogeneous areas. Stratification is a well-known statistical technique used for designing monitoring (or sampling) programs with denser networks in some areas than in others. The difference in probability density may be based, for example, on spatial autocovariances, statistical risk of contamination, plume detection probabilities or empirical judgement, among many others. Here we propose a statistically based stratification: homogeneous areas are monitored according to the frequency with which they appear in the prior sampling program. The inclusion of homogeneous areas was considered important by the manager because sediment granulometry and physical and chemical characteristics have strong correlations with the amount of xenobiotics the sediment can retain and because these areas were planned to be geographic spatial units in an environmental management system. Hence, four types of sediments were established on the basis of three physical and chemical variables and the manager demanded that the proportion of stations in the four types of sediments in the monitoring network be similar to that of the sampling campaign (thus the constraint on the proportions).

Optimisation consists, then, of finding an optimal subset with a combination of stations taken from a larger set. Even for relatively small set cardinalities the number of combinations is too high to allow them all to be exhaustively evaluated in a reasonable amount of time. One of the most well known algorithms for solving combinatorial problems is simulated annealing, in particular in sampling/monitoring network optimisation (e.g. Meyer et al., 1994, Pardo-Igúzquiza, 1998, van Groenigen et al., 1999, Brus et al., 2000, Brus et al., 2002, Nunes et al., 2004a, Nunes et al., 2004b).

The article is divided in five sections. This Introduction is followed by a second section where the theoretical geostatistical and optimisation framework is presented. In this section the geostatistical parameter most frequently used to measure accuracy, the kriging estimation error variance, is explained and compared with another geostatistical measure of accuracy, the fictitious point estimation error variance. Also the simulated annealing heuristic used to solve the optimisation problem is introduced. In the third section a case-study is presented and data transformations are explained, while, in the fourth section, optimisation results are discussed. Finally, in the last section, the most important conclusions are drawn.

Section snippets

Estimation of probability distribution functions

Indicator coding implies transforming a continuous or discrete variable, Z(x), into a discrete (0,1) one, the indicator I(x). Considering a threshold value zc on Z, I(x) is equal to 1 if Z(x)≤zc, and 0 otherwise. Therefore the variable at each location is transformed into a distribution function, i.e. the probability of exceeding the threshold is calculated within a region. With a sufficiently large number of thresholds the prior (and post) probability distribution of Z is calculated at each

Study area and source data

The Sado Estuary is the second largest estuary in Portugal with an area of approximately 24,000 hectares. It is located on the west coast of Portugal, 45 km south of Lisbon (Fig. 1). Most of the estuary is classified as a nature reserve. The Sado Estuary basin is subject to intensive land-use practices and plays an important role in the local and national economy. Most of the activities in the estuary (e.g. industry, shipping, intensive farming, tourism and urban development) have negative

Feasible space

The number of combinations of Ω sampling stations with ω possible monitoring stations is given by the well-known formula W=Ω! / ((Ω−ω)! ω!). Now, if one wants to calculate the combinations conditioned to the reproduction of the proportions, the expression becomes W=i=1kΩi!(Ωiωi)!ωi!

where i is the indicator number, Ωi the number of sampling stations with the indicator i, and ωi the number of monitoring stations with the indicator i imposed by conditioning. The number of combinations in each

Conclusions

The following conclusions can be drawn: (i) Objective function conditioning is necessary to guarantee reproduction of the probability density functions of indicator variables; (ii) the higher the conditioning the closer the posterior (estimated) pdf is to the prior (data) pdf; (iii) conditioning with δ<0.3 leads to extremely long running times and has been shown to be unnecessary; (iv) if no conditioning is used the estimation error variance increases with the rise in the number of monitoring

References (47)

  • R.L. Bras et al.

    Network design for the estimation of areal mean of rainfall events

    Water Resources Research

    (1976)
  • D.J. Brus et al.

    Designing efficient sampling schemes for reconnaissance surveys of contaminated bed sediments in water courses

    Geologie en Mijnbouw-Netherlands Journal of Geosciences

    (2000)
  • D.J. Brus et al.

    Optimising two- and three-stage designs for spatial inventories of natural resources by simulated annealing

    Environmental and Ecological Statistics

    (2002)
  • S. Caeiro et al.

    Delineation of estuarine management areas using multivariate geostatistics: the case of Sado Estuary

    Environmental Science and Technology

    (2003)
  • W.F. Caselton et al.

    Hydrologic networks: information transmission

    Journal of the Water Resources Planning and Management Division

    (1980)
  • W.G. Cochran

    Sampling Techniques

    (1977)
  • H. Cohn et al.

    Simulated annealing: searching for an optimal temperature schedule

    SIAM Journal on Optimization

    (1999)
  • G. Christakos

    A Bayesian/maximum-entropy view to the spatial estimation problem

    Mathematical Geology

    (1990)
  • G. Christakos

    Random Field Models in Earth Sciences

    (1992)
  • C.V. Deutsch et al.

    GSLIB, Geostatistical Software Library and User's Guide

    (1992)
  • D. D'Or et al.

    Application of the BME approach to soil texture mapping

    Stochastic Environmental Research and Risk Assessment

    (2001)
  • S. Geman et al.

    Stochastic relaxation, Gibbs distributions and the Baysian restoration of images

    IEEE Transaction on Pattern Analysis and Machine Intelligence

    (1984)
  • J.J. Gruijter et al.

    Continuous soil maps—a fuzzy set approach to bridge the gap between aggregation levels of process and distribution models

    Geoderma

    (1997)
  • Cited by (16)

    • Optimal sampling design for reclaimed land management in mining area: An improved simulated annealing approach

      2019, Journal of Cleaner Production
      Citation Excerpt :

      The aim of the sampling point optimization is to minimize the information loss regarding spatial prediction and reduce the cost of sample collection. ( Bueso et al., 1999; Nunes et al., 2006). The obtainment of sampling point data generally depends on traditional soil survey presently (Bui and Moran, 2001).

    • Retrospective analysis: A validation procedure for the redesign of an environmental monitoring network

      2018, Measurement: Journal of the International Measurement Confederation
      Citation Excerpt :

      In practice, these requirements correspond to maximizing the information content of the network while minimizing the costs and labor involved in the task. For its inherent difficulty and practical usefulness, the problem has attracted the interest of several scientists, who have proposed a wide array of possible technical solutions [1–10]. The search of an optimal solution addresses two possible situations: the design of a new network and the redesign of an already existing network.

    • Application of an entropy-based Bayesian optimization technique to the redesign of an existing monitoring network for single air pollutants

      2009, Journal of Environmental Management
      Citation Excerpt :

      For Elkamel et al. (2008) the candidate sites come from mathematical modeling of pollutant dispersion from a set of smokestacks with inter-station correlation coefficients used to quantify site importance. For Nunes et al. (2006), candidate sites arise from data obtained during an intensive field campaign with an analysis of geostatistical variance used for site selection. In the present analysis, we are interested in identifying both redundant sites and potential new sites from an existing monitoring network.

    View all citing articles on Scopus
    1

    Tel.: +351 239790200.

    2

    Tel.: +351 218417834.

    View full text