Factor analysis applied to regional geochemical data: problems and possibilities
Introduction
The principal aim of factor analysis, which was developed initially by psychologists, is to explain the variation in a multivariate data set by as few “factors” as possible and to detect hidden multivariate data structures. The term “factor” used by psychologists is equivalent to “controlling processes” in geochemistry. Thus, theoretically, factor analysis should be ideally suited for an easy presentation of the “essential” information inherent in a geochemical data set with many analysed elements.
In regional geochemistry an advantage would be that instead of presenting maps for 40–50 (or more) elements only maps of 4–6 factors may have to be presented, containing a high percentage of the information of the single element maps. It is even more informative if factor analysis can be used to reveal unrecognised multivariate structures in the data that may be indicative of certain geochemical processes, or, in exploration geochemistry, of hidden mineral deposits. Factor analysis has been successfully used for this purpose (e.g. 23, 16, 15, 17), but is still a controversial method. A focal point of critique is that too many different techniques are available, all giving slightly different results (Rock, 1988). It is argued that the statistically untrained users will thus always be tempted to experiment until finding a solution that fits their preconceptions. Another problem is that users are not always aware of some of the basic requirements for carrying out a successful factor analysis. As a result, factor analysis is very often merely applied as an exploratory tool, and results could often have been predicted using much simpler methods.
Using a large regional scale geochemical data set containing 605 samples and more than 50 variables, the objective of this study is to answer some fundamental questions with regard to the use of factor analysis in geochemistry:
- •
Can (and should) factor analysis be applied to such a high-dimensional data set? What are the prerequisites for applying factor analysis to such a data set?
- •
What are the results of factor analysis when such a large data set is investigated?
- •
Can the information contained in more than 50 single element maps be presented in just a few (e.g. 3–6) factor maps?
Furthermore, an attempt is made to answer the question as to which parameters have the largest influence on the results of factor analysis. The influence of the:
- •
actual method used (principal factor analysis (PFA) versus maximum likelihood (ML), method of factor rotation),
- •
number of factors extracted, and
- •
number of elements entered into the factor analysis will be discussed.
Section snippets
The Kola project
From 1992 to 1998, the Geological Surveys of Finland (GTK) and Norway (NGU) and Central Kola Expedition (CKE), Russia, carried out a large, international multi-media, multi-element geochemical mapping project, covering 188,000 km2 north of the Arctic Circle. The entire area between 24 and 35.5°E up to the Barents Sea coast (Fig. 1) was sampled during the summer of 1995. Results of the “Kola Ecogeochemistry” project are documented on a web site (http://www.ngu.no/Kola) and in a geochemical atlas
Results
Table 1 summarises the variables and analytical results (minimum, maximum, mean, median, standard deviation, median absolute deviation (MAD) of the data, and p values of a Kolmogorov–Smirnov test for normality (see Afifi and Azen, 1979) of the original, the log-transformed, and the Box–Cox (Box and Cox, 1964) transformed data.
Factor analysis is a very data-sensitive technique, a fact that is often neglected. A careful univariate analysis should be carried out for any data set prior to its being
Conclusions
The worked example using the C-horizon soil data of the Kola project demonstrated that even when neglecting all prerequisites for a sensible factor analysis, interesting results may emerge that can be interpreted based on geochemical reasoning. However, most of the results are governed by regional geology and could have been predicted using pre-existing information. Even the more informative results presented in Fig. 10 can all be extracted from a small selection of single element maps as
Acknowledgments
The authors wish to thank the sampling and laboratory teams from Finland, Norway and Russia. Galina Kashulina (INEP, Apatity) and Viktor Melezhik (NGU, Trondheim) gave helpful comments on an earlier draft of this paper. Viktor Melezhik supplied us with the black and white version of the geological map used in this paper. Discussions on the results of factor analyses of the Kola C-horizon data with V. Chekushin, R. Dutter, F. Koller, H. Niskavaara and R. Salminen were appreciated. F. Koller
References (53)
- et al.
The use of tree bark for environmental pollution monitoring in the Czech Republic
Environ. Pollut.
(1998) - et al.
Geochemical and metallogenic provinces: a discussion initiated by results from geochemical mapping across northern Fennoscandia
J. Geochem. Explor.
(1990) - et al.
The fractal nature of geochemical landscapes
J. Geochem. Explor.
(1992) Unmasking multivariate anomalous observations in exploration geochemical data from sheeted-vein tin mineralisation near Emmaville, N.W.S., Australia
J. Geochem. Explor.
(1990)- et al.
Comparison of interpretations of geochemical soil data by some multivariate statistical methods, Key Anacon, N.B., Canada
J. Geochem. Explor.
(1985) - et al.
Interpreting geochemical data from Outokumpu, Finland: an MVE-robust factor analysis
J. Geochem. Explor.
(1993) The chi-square plot: a tool for multivariate outlier recognition
J. Geochem. Explor.
(1989)- et al.
The management, analysis and display of regional geochemical data
J. Geochem. Explor.
(1980) Exploratory data analysis: recent advances for the interpretation of geochemical data
J. Geochem. Explor.
(1988)- et al.
Reductive coprecipitation as a separation method for the determination of gold, palladium, platinum, rhodium, silver, selenium and tellurium in geological samples by graphite furnace atomic adsorption spectrometry
Anal. Chim. Acta
(1990)
Statistical Analysis: A Computer Oriented Approach
The Statistical Analysis of Compositional Data
The use of transformations
Biometrics
The statistical analysis of variance — heterogeneity and the logarithmic transformation
J. Roy. Stat. Soc.
Statistical Factor Analysis and Related Methods. Theory and applications
An analysis of transformations
J. Roy. Stat. Soc. (B)
Principal components analysis using the hypothetical closed array
Math. Geol.
An analytic solution for approximating simple structure in factor analysis
Psychometrika
The scree test for the number of factors
Mult. Behav. Res.
Glacigenic deposits
Robust principal component and factor analysis in the geostatistical treatment of environmental data
Environmetrics
Modern Factor Analysis, 3rd Edition
Cited by (0)
- 1
Present address: Margaretenstr. 110/6, A-150 Wien, Austria.