Statistics and GIS in environmental geochemistry — some problems and solutions

https://doi.org/10.1016/S0375-6742(98)00048-XGet rights and content

Abstract

Statistics and geographical information system (GIS) are receiving more and more attention in environmental geochemistry. However, it is important to know the functions and limitations, the advantages and disadvantages of these techniques for better understanding of their applications. Univariate statistics is useful for mean calculation, identification of probability distribution and outlier detection. Multivariate analysis plays an important role in the study of relationships among variables. However, while dealing with regionalized variables in environmental geochemistry, the conventional statistics show their shortcomings as they are based on some kind of assumptions for random variables. Spatial analysis makes use of the spatial coordinate information of the variables, and also takes the spatial correlation into consideration. However, these pure mathematical methods are still unsatisfactory as the nature of environmental geochemistry is far from being so simple. GIS provides visualization and some spatial analysis functions with much spatial information involved. An expert system is useful for classification and prediction based on various types of information. However, the rule base for expert systems in environmental geochemistry is too small, and needs to be developed. Problems and possible solutions with the application of statistics and GIS in environmental geochemistry are discussed. Examples are based on the authors' experiences in the Yangtze River basin, China, and in southeastern Sweden. Several ideas are discussed in this paper. A `robust-symmetric mean' proposed by the authors is one of the best methods for mean calculation. For the probability distribution of trace elements, the widely accepted `log-normal distribution' is only a special case of `positively skewed distributions' which is more adequate. The combination of univariate methods and PCA is used to detect outlying samples. Partial least square (PLS) regression, principal component analysis (PCA), cluster analysis, discriminant analysis and expert systems may be used to differentiate anthropogenic anomalies from the natural background. Spatial correlations among environmental geochemical variables are revealed by cross-variograms. An environmental information system, with the integration of statistics, GIS, expert systems and environmental models should be established to further the study in environmental geochemistry, as well as to provide decision support.

Introduction

With the rapid development of computer technology, statistics and GIS are receiving increasing attention in environmental geochemistry. However, it is important to understand the functions and limitations, the advantages and disadvantages of these techniques for better understanding their applications. The problems of probability distribution, mean calculation, spatial structure, correlation, database management, visualization, prediction, decision support, outlier detection, and differentiation of anthropogenic from natural backgrounds are interesting to geochemists. The problems and possible solutions with the application of statistics and GIS in environmental geochemistry are discussed in this paper. Examples are based on the authors' experiences in the Yangtze River basin, China, and in southeastern Sweden.

During the 1980s, two massive environmental protection projects (`A Study on Background Values in the Yangtze River System', 1986–1990, and `A Study on Background Values in the Dongting Lake System, 1981–1985) were carried out in China, and a large quantity of data were acquired. Statistics, including univariate, multivariate and spatial statistics have been carried out for the data by the authors (Zhang et al., 1995; Zhang and Selinus, 1997).

The Geological Survey of Sweden (SGU) initiated a national mapping program in 1982 with three types of geochemical samples (bedrock, till, and biogeochemistry) with the objective to produce a detailed geochemical atlas of the entire country, and the program is still going on. All the data are stored in a database at SGU. Problems concerning the calculation of mean values, outlier detection, relationships among elements, spatial distribution features, and relationships among the different types of geochemical samples, differentiation of anthropogenic from natural background have been investigated by the authors (Zhang et al., 1998a, Zhang et al., 1998b; Zhang and Selinus, 1998a, Zhang and Selinus, 1998b). Because of the large amount of data, a small area in the southeastern part of Sweden, where much varying information is available, has been chosen for study. The size of the area is 75 × 75 km2 (Fig. 1). As the direction of glacier movement during the last ice age was from north-northwest to south-southeast in the region (Lundqvist, 1994), the bedrocks in the north and west part of the area will contribute to the heavy metal concentrations in tills and roots in the area under study. Therefore, the bedrock map is enlarged 25 km in both the northern and western directions. A simplified overview of the geology of the project area is shown in Fig. 1.

Four major types of bedrocks are distributed in the area: acid volcanic rocks, granite rocks, basic rocks, and sedimentary rocks. Lakes occupy 2.1% of the area, and the area is dominated by granites, with a percentage of 64.3%. Acid volcanic rocks are the second most common, accounting for 21.5%. Only 3.7% of the total area is occupied by shales, sandstones and limestones. The area of basic rocks is also rather small, only 8.5%. The effect of the basic rocks on metal distribution is however significant, which will be shown, as they have high concentrations of heavy metals (except for Pb) and are easily weathered. Lead concentrations, on the other hand, are elevated in the acid volcanic rocks (Zhang et al., 1998b).

Section snippets

Problems in environmental geochemistry and possible solutions

When we are dealing with environmental geochemical data, many questions may be encountered, such as probability distribution, mean calculation, spatial structure, correlation, database management, visualization, prediction, decision support, outlier detection, and differentiation of an anthropogenic from a natural background. These problems can possibly be solved with the aid of statistics (univariate, multivariate, and spatial statistics), GIS, expert systems and environmental information

Summary

The problems and possible solutions discussed above are summarized in Table 4.

Some significant points may be emphasized. The `robust-symmetric mean' proposed by the authors is one of the best means. The combination of univariate methods and PCA is used to detect outlying samples. PLS, PCA, cluster analysis and expert systems are useful to differentiate anthropogenic from natural anomalies. Spatial correlations among environmental geochemical variables are revealed by the cross-variogram.

Acknowledgements

Dr. Chaosheng Zhang thanks the Swedish Institute for providing a scholarship, enabling him to visit the Geological Survey of Sweden. The study is partly supported by the `One-hundred-person Plan' of the Chinese Academy of Sciences and National Natural Science Foundation of China. The authors would like to express their thanks to Dr. Frank Manheim, Dr. Larry Gough and two anonymous reviewers for their helpful comments which improved the paper.

References (25)

  • Esbensen, K., Schönkopf, S., Midtgaard, T., 1996. Multivariate Analysis in Practice. Camo AS, Trondheim, 312...
  • Isaaks, E.H., Srivastava, R.M., 1989. An Introduction to Applied Geostatistics. Oxford University Press, New...
  • Cited by (100)

    • Carbonate bedrock control of soil Cd background in Southwestern China: Its extent and influencing factors based on spatial analysis

      2022, Chemosphere
      Citation Excerpt :

      As the retrieved data follow nearly log-normal distributions rather than normal ones. They were logarithmic transformed (base 10) for the following calculations (Zhang and Selinus, 1998). Curves of their distributions were simulated with R and were plotted.

    • Application of exploratory and Spatial Data Analysis (SDA), singularity matrix analysis, and fractal models to delineate background of potentially toxic elements: A case study of Ahvaz, SW Iran

      2020, Science of the Total Environment
      Citation Excerpt :

      Furthermore, this method is usable for identifying superposed populations and setting thresholds among them (Pan et al., 2017; Sinclair, 1991). In an area with complex environmental geochemistry characteristics, exploratory data analysis (EDA) and spatial data analysis (SDA) (hereafter, EDA-SDA) have been proposed as efficacious methods for mapping the geochemical modeles to separate the background from anomaly (hot spots) and facilitate further analysis of the processes following separation of anomaly form background (Zhang and Selinus, 1998; Zhou and Xia, 2010). EDA methods combine graphs, including histogram, boxplot, and normal quantile-quantile (Q-Q) plot, providing valuable insight into characteristics of the data.

    View all citing articles on Scopus
    View full text