Elsevier

Applied Geochemistry

Volume 17, Issue 3, March 2002, Pages 185-206
Applied Geochemistry

Factor analysis applied to regional geochemical data: problems and possibilities

https://doi.org/10.1016/S0883-2927(01)00066-XGet rights and content

Abstract

A large regional geochemical data set of C-horizon podzol samples from a 188,000 km2 area in the European Arctic, analysed for more than 50 elements, was used to test the influence of different variants of factor analysis on the results extracted. Due to the nature of regional geochemical data (neither normal nor log-normal, strongly skewed, often multi-modal data distributions), the simplest methods of factor analysis with the least statistical assumptions perform best. As a result of this test it can generally be suggested to use principal factor analysis with an orthogonal rotation for such data. Selecting the number of factors to extract is difficult, however, the scree plot provides some useful help. For the test data, a low number of extracted factors gave the most informative results. Deleting or adding just 1 element in the input matrix can drastically change the results of factor analysis. Given that selection of elements is often rather based on availability of analytical packages (or detection limits) than on geochemical reasoning this is a disturbing result. Factor analysis revealed the most interesting data structures when a low number of variables were entered. A graphical presentation of the loadings and a simple, automated mapping technique allows extraction of the most interesting results of different factor analyses in one glance. Results presented here underline the importance of careful univariate data analysis prior to entering factor analysis. Outliers should be removed from the dataset and different populations present in the data should be treated separately. Factor analysis can be used to explore a large data set for hidden multivariate data structures.

Introduction

The principal aim of factor analysis, which was developed initially by psychologists, is to explain the variation in a multivariate data set by as few “factors” as possible and to detect hidden multivariate data structures. The term “factor” used by psychologists is equivalent to “controlling processes” in geochemistry. Thus, theoretically, factor analysis should be ideally suited for an easy presentation of the “essential” information inherent in a geochemical data set with many analysed elements.

In regional geochemistry an advantage would be that instead of presenting maps for 40–50 (or more) elements only maps of 4–6 factors may have to be presented, containing a high percentage of the information of the single element maps. It is even more informative if factor analysis can be used to reveal unrecognised multivariate structures in the data that may be indicative of certain geochemical processes, or, in exploration geochemistry, of hidden mineral deposits. Factor analysis has been successfully used for this purpose (e.g. 23, 16, 15, 17), but is still a controversial method. A focal point of critique is that too many different techniques are available, all giving slightly different results (Rock, 1988). It is argued that the statistically untrained users will thus always be tempted to experiment until finding a solution that fits their preconceptions. Another problem is that users are not always aware of some of the basic requirements for carrying out a successful factor analysis. As a result, factor analysis is very often merely applied as an exploratory tool, and results could often have been predicted using much simpler methods.

Using a large regional scale geochemical data set containing 605 samples and more than 50 variables, the objective of this study is to answer some fundamental questions with regard to the use of factor analysis in geochemistry:

  • •

    Can (and should) factor analysis be applied to such a high-dimensional data set? What are the prerequisites for applying factor analysis to such a data set?

  • •

    What are the results of factor analysis when such a large data set is investigated?

  • •

    Can the information contained in more than 50 single element maps be presented in just a few (e.g. 3–6) factor maps?

Furthermore, an attempt is made to answer the question as to which parameters have the largest influence on the results of factor analysis. The influence of the:

  • •

    actual method used (principal factor analysis (PFA) versus maximum likelihood (ML), method of factor rotation),

  • •

    number of factors extracted, and

  • •

    number of elements entered into the factor analysis will be discussed.

Section snippets

The Kola project

From 1992 to 1998, the Geological Surveys of Finland (GTK) and Norway (NGU) and Central Kola Expedition (CKE), Russia, carried out a large, international multi-media, multi-element geochemical mapping project, covering 188,000 km2 north of the Arctic Circle. The entire area between 24 and 35.5°E up to the Barents Sea coast (Fig. 1) was sampled during the summer of 1995. Results of the “Kola Ecogeochemistry” project are documented on a web site (http://www.ngu.no/Kola) and in a geochemical atlas

Results

Table 1 summarises the variables and analytical results (minimum, maximum, mean, median, standard deviation, median absolute deviation (MAD) of the data, and p values of a Kolmogorov–Smirnov test for normality (see Afifi and Azen, 1979) of the original, the log-transformed, and the Box–Cox (Box and Cox, 1964) transformed data.

Factor analysis is a very data-sensitive technique, a fact that is often neglected. A careful univariate analysis should be carried out for any data set prior to its being

Conclusions

The worked example using the C-horizon soil data of the Kola project demonstrated that even when neglecting all prerequisites for a sensible factor analysis, interesting results may emerge that can be interpreted based on geochemical reasoning. However, most of the results are governed by regional geology and could have been predicted using pre-existing information. Even the more informative results presented in Fig. 10 can all be extracted from a small selection of single element maps as

Acknowledgments

The authors wish to thank the sampling and laboratory teams from Finland, Norway and Russia. Galina Kashulina (INEP, Apatity) and Viktor Melezhik (NGU, Trondheim) gave helpful comments on an earlier draft of this paper. Viktor Melezhik supplied us with the black and white version of the geological map used in this paper. Discussions on the results of factor analyses of the Kola C-horizon data with V. Chekushin, R. Dutter, F. Koller, H. Niskavaara and R. Salminen were appreciated. F. Koller

References (53)

  • A.A. Afifi et al.

    Statistical Analysis: A Computer Oriented Approach

    (1979)
  • J. Aitchison

    The Statistical Analysis of Compositional Data

    (1986)
  • Äyräs, M., Reimann, C., 1995. Joint ecogeochemical mapping and monitoring in the scale of 1:1 mill. in the West...
  • M.S. Bartlett

    The use of transformations

    Biometrics

    (1947)
  • M.S. Bartlett et al.

    The statistical analysis of variance — heterogeneity and the logarithmic transformation

    J. Roy. Stat. Soc.

    (1946)
  • A. Basilevsky

    Statistical Factor Analysis and Related Methods. Theory and applications

    (1994)
  • Bølviken, B., Bergström, J., Björklund, A., Kontio, M., Lehmuspelto, P., Lindholm, T., Magnusson, J., Ottesen, R.T.,...
  • G.E.P. Box et al.

    An analysis of transformations

    J. Roy. Stat. Soc. (B)

    (1964)
  • J.C. Butler

    Principal components analysis using the hypothetical closed array

    Math. Geol.

    (1976)
  • J.B. Carroll

    An analytic solution for approximating simple structure in factor analysis

    Psychometrika

    (1953)
  • R.B. Cattell

    The scree test for the number of factors

    Mult. Behav. Res.

    (1966)
  • Dutter, R., Leitner, T., Reimann, C., Wurzer, F., 1992. Grafische und geostatistische Analyse am PC. Beiträge zur...
  • K. Eriksson

    Glacigenic deposits

  • P. Filzmoser

    Robust principal component and factor analysis in the geostatistical treatment of environmental data

    Environmetrics

    (1999)
  • Garrett, R.G., Nichol, I., 1969. Factor analysis as an aid in the interpretation of regional geochemical stream...
  • H.H. Harman

    Modern Factor Analysis, 3rd Edition

    (1976)
  • Cited by (0)

    1

    Present address: Margaretenstr. 110/6, A-150 Wien, Austria.

    View full text