Multivariate statistical approach to estimate mixing proportions for unknown end members
Highlights
► Multivariate methods included principal components analysis and an end-member mixing model. ► The approach was tested under controlled conditions (i.e., true values of estimates were known a priori). ► The method was applied to field data from a groundwater study as a field example and partial method validation.
Introduction
Mixing calculations are used in several areas of science including hydrology (Carrera et al., 2004). Specifically, mixing calculations are used for studies that pertain to erosion rates (Allegre et al., 1996, Roy et al., 1999), emissions inventories (Biesenthal and Shepson, 1997), mineral dissolution, and atmospheric deposition (Johnson et al., 2001, Williams et al., 2001). Mixing calculations are based on the proportions of two or more end members, or sources of origin, that contribute to a specific mixture such as a water sample (Huntoon, 1981). In hydrology, mixing proportions can be estimated on the basis of hydrochemical data such as major ion chemistry and stable isotopes (Fritz et al., 1976, Christopherson and Hooper, 1992, Doctor et al., 2006).
Principal component analysis (PCA) is another widely used technique for hydrochemical analysis in hydrological applications (Cao et al., 1998, Davis, 2002) and is a useful method in data reduction, manipulation, and visualization of complex data systems, where patterns and data similarities are not well understood (Melloul and Collin, 1992, Stauffer et al., 1985). PCA is a method that can be utilized to graphically plot complex multivariate datasets and elucidate data patterns that otherwise might not be noticed. This method attempts to organize large datasets into distinct patterns that reveal the most influential physical or chemical parameters (Woocay and Walton, 2008). More specifically, PCA is a linear transformation of multivariate data, where the transformed axes align with the greatest variance in the data (Davis, 2002).
Several studies have used PCA or other related multivariate statistical techniques in hydrological settings. Moore et al. (2009) included a study of the relation between spring characteristics and karst aquifers using PCA, demonstrating how hydrochemistry can be used to simulate potential sources that affect spring discharge. Woocay and Walton (2008) used principal components factor analysis and cluster analysis to understand groundwater flow, hydrochemical facies, groundwater evolution, and groundwater interactions. Melloul and Collin (1992) demonstrate how the use of PCA can be used with a large quantity of complex data to identify groups of water that have similar characteristics and, furthermore, to identify the factors that could influence the change in hydrochemistry. Mahler et al. (2008) applied factor analysis to responses of storm water to major ions and turbidity comparing two karst aquifers with distinct characteristics.
An end-member mixing model developed by Carrera et al. (2004) was a novel approach because it not only estimates mixing proportions but also end-member compositions when these compositions are not known; e.g., when end-member samples cannot be obtained. Christopherson and Hooper (1992) developed a model that combined end-member mixing with PCA to estimate mixing proportions through computation and knowledge of a given hydrological system. However, if end-member compositions are not well defined for a particular application, this model might not be conducive to defining mixing proportions (Christopherson and Hooper, 1992). Doctor et al. (2006) applied this model using major ion chemistry and stable isotopes of water to estimate mixing proportions from three identified end members. A similar approach to Christopherson and Hooper (1992) was taken by Laaksoharju et al. (1999) to develop a multivariate mixing and mass balance model that included PCA.
Long and Valder (2011) applied PCA with end-member mixing analysis to characterize groundwater flow patterns in karst aquifers in the Black Hills of South Dakota, USA; mixing proportions for wells and springs and the hydrochemistry of end members were estimated, and PCA was used to help relate these end members to hydrogeologic domains. Long and Valder (2011) could not, however, verify their method because none of the estimates were known a priori. In this paper, we apply a similar method to that of Long and Valder (2011), except with synthetically generated data as a surrogate for observation data, and by so doing, estimates of mixing proportions and end-member hydrochemistries were known a priori and thus could be verified.
The approach herein has three main components: a basic conceptual model of the flow system, PCA, and an end-member mixing model. Conceptual model development, as the initial step in the approach, provides a physical basis for the method application and is used to help understand the meaning of the mathematical and statistical models and results in connection with the hydrology of the study area. Conceptual models may be based on topographic maps, potentiometric surface maps, geologic maps, or field reconnaissance. PCA was used as the second step in the approach for constraining and identifying the end members within a dataset. Next, an end-member mixing model was used to estimate mixing proportions and end-member compositions by inverse parameter estimation. Finally, the estimated end members were added to the original dataset and analyzed by PCA to assess the relation of the estimated end members to sampled data.
To test the approach, multivariate datasets for three hypothetical scenarios were synthetically generated assuming linear mixing of end members so that the expected results would be known a priori and could be compared with the mixing model results. For the first two scenarios, mixing proportions of three assumed end members were generated, and hydrochemical compositions representing water samples were calculated from these end members and mixing proportions. For the third scenario, two actual groundwater samples representing possible end-member waters were mechanically combined into 12 mixed samples of varying proportions, and then analyzed for major ion concentrations. Mixing proportions and end-member compositions estimated by using the end-member mixing model were compared to the known values. Finally, the multivariate statistical approach was applied to a previously published study of end-member mixing in groundwater, and mixing model results were compared to the results of that study. The analyses presented in this paper demonstrate that the combination of PCA and end-member mixing together with a conceptual understanding of the hydrological system in a systematic approach is useful in characterizing and quantifying mixing in hydrological systems.
Section snippets
Synthetic datasets and overview of three scenarios
Three scenarios are presented, each of which includes synthetic hydrochemical data, where these samples are varying mixtures of two or three end members. These datasets are intended as analogues to groundwater or surface-water systems that involve the mixing of waters from different sources, or end members. For the three scenarios presented, predetermined end-member compositions and mixing proportions used to generate the mixed-water compositions are referred to as known end-member compositions
Results
The results of the three scenarios are intended to verify that the model functions as designed by comparing model-estimated values to known values. For evaluation of the end-member inverse model, this model was compared with a different end-member mixing approach (Christopherson and Hooper, 1992). Also, presented is a partial model validation by comparing the results of the method presented in this paper to the results from the method described by Huntoon (1981), which also shows an example of
Discussion
The method presented in this paper was applied to data from Huntoon (1981) as a simple example of a study-site application. This example showed some of the complications associated with application to an actual study site but also some advantages over the method used by Huntoon (1981). For example, site 7 was an outlier on the scores plot (Fig. 8b), and thus could have been selected to approximate an end member, but the scores plot also indicated that this point probably was not directly
Conclusions
Principal components analysis and an end-member mixing model were combined in a multivariate statistical approach to estimate unknown end-member hydrochemical compositions in water and the relative mixing proportions of those end members in mixed waters. The method was tested and shown to be effective in estimating these end members and mixing proportions, which were known a priori.
This method was tested on controlled datasets and found effective in estimating these end members and mixing
Acknowledgements
The authors would like to graciously thank the developers and engineers of the CAMO software, who provided much insight into the workings and understanding of The Unscrambler® model. We thank John Chartier, CAMO Senior Account Manager, for his time and effort with questions and resources needed when using The Unscrambler®. Angela Schmidt, CAMO Chemical Engineer, helped with technical questions and interpretations of the PCA plots. Last, we would like to thank Pallaoor V Sundareshwar and John
References (36)
- et al.
Sr–Nd–Pb isotope systematics in Amazon and Congo River systems. Constraints about erosion processes
Chem. Geol.
(1996) - et al.
Multivariate mixing and mass balance (M3) calculations, a new tool for decoding hydrogeochemical information
Appl. Geochem.
(1999) - et al.
Multivariate analyses with end-member mixing to characterize groundwater flow: Wind Cave and associated aquifers
J. Hydrol.
(2011) - et al.
The ‘principal components’ statistical method as a complementary approach to geochemical methods in water quality factor identification; application to the Coastal Plain aquifer of Israel
J. Hydrol.
(1992) - et al.
Geochemical and statistical evidence of recharge, mixing, and controls on spring discharge in a eogenetic karst aquifer
J. Hydrol.
(2009) - et al.
Geochemistry of dissolved and suspended loads of the Seine River, France: anthropogenic impact, carbonate and silicate weathering
Geochim. Cosmochim. Acta
(1999) - et al.
Observations of anthropogenic inputs of the isoprene oxidation products methyl vinyl ketone and methacrolein to the atmosphere
Geophys. Res. Lett.
(1997) - Bro, R., Acar, E., Kolda, T., 2007. Resolving the Sign Ambiguity in the Singular Value Decomposition. Sandia National...
- CAMO Software AS Inc., 2010, The Unscrambler® v9.8, Multivariate Data Analysis Software Program....
- et al.
Data transformation and standardization in the multivariate analysis of river water quality
Ecol. Appl.
(1998)
A methodology to computing mixing ratios with uncertain end-members
Water Resour. Res.
Multivariate analysis of stream water chemical data: the use of principal components analysis for the end-member mixing problem
Water Resour. Res.
Statistics and Data Analysis in Geology
Quantification of karst aquifer discharge components during storm events through end-member mixing analysis using natural chemistry and stable isotopes as tracers
Hydrogeol. J.
Cited by (37)
Source apportionment of suspended sediment using grain-size end-member analysis
2023, Marine Environmental ResearchAn improved method of recharge sources analysis and its application in an unconfined aquifer
2021, Journal of Environmental ManagementAn intelligent clustering method for devising the geochemical fingerprint of underground aquifers
2021, HeliyonCitation Excerpt :The aquifer from which each well extracts its water is known only in 18 of the 57 cases, while in other 39 cases it can be only hypothesized based on geological considerations; the ultimate purpose of the present study it to devise an automatic, machine learning based method to identify the geochemical fingerprint of each aquifer, so that each unknown well can be assigned an aquifer, and the control network can be improved. This problem is associated to the well-known clustering problem, usually dealt with using classic statistical methods such as PCA [22, 23, 24, 25, 26]. In the typical setting, this problem is stated as follows: given samples of different aquifer, is there a geochemical fingerprint that allows one to identify the aquifers?
An integrated approach to estimate the mixing ratios in a karst system under different hydrogeological conditions
2020, Journal of Hydrology: Regional StudiesThe geological and hydrogeological framework of the Panabako, Kodjari, and Bimbilla formations of the Voltaian supergroup – Revelations from groundwater hydrochemical data
2020, Applied GeochemistryCitation Excerpt :The use of multiple complementary approaches enhances confidence in the interpretation of results and decision making. There are several other studies in which multivariate statistical methods were used conjunctively with other techniques to develop critical decision making products and for facilitating the understanding of regional to local processes (e.g. Mencio and Mas-Plas, 2008; Valder et al., 2012; Masoud, 2014; Masoud et al., 2018; Yidana et al., 2018; Bekele et al., 2019; Trabelsi and Zouari, 2019; Wang et al., 2019). Decision making processes involving groundwater resources utility in the typical rural setting of this research include the determination of appropriate treatment mechanism for water with excess concentrations of certain parameters, and the protection of vulnerable shallow aquifers from anthropogenic activities.
Cluster analysis for groundwater classification in multi-aquifer systems based on a novel correlation index
2019, Journal of Geochemical ExplorationCitation Excerpt :Thus, there is a need for effective tools to identify approximate end-members from the analysis of the chemical composition of water samples. Moreover, the analysis of the spatial variability of tracers gives an insight into the aquifer heterogeneity and connectivity as well as the processes that influence the water chemistry (Güler et al., 2002; Valder et al., 2012). Multivariate statistics, such as Principal Component Analysis (PCA), factor analysis, and clustering methods (Tan et al., 2006; Abesser et al., 2005) are often applied for the analysis of groundwater samples.