Abstract
Common multivariate clustering techniques are ineffective in identifying subtle patterns of correlation, and clustering of variables or samples within complex geochemical datasets. This study compares the combination of singular value decomposition (SVD) and semi discrete decomposition (SDD), with that of hierarchical cluster analysis (HCA), to examine patterns within a multielement soil geochemical dataset from an agricultural area in the vicinity of Pb–Zn mining operations in central Iran. SVD was used to both identify patterns of correlation between variables and samples and to “denoise” the data, and SDD to simultaneously cluster the samples and variables. The results reveal various spatial associations of mining waste-associated metals As, Ba, Pb and Zn, and within the remaining elements whose distribution is largely controlled by the major oxides. SVD–SDD was found to be superior to HCA, in its ability to detect subtle clusters in soil geochemistry indicative of mine-related contamination in the study area.
Similar content being viewed by others
References
Aitchison J, Barceló-Vidal C, Martín-Fernández JA, Pawlowsky-Glahn V (2000) Log-ratio analysis and compositional distance. Math Geol 32:271–275
Alter O, Brown PO, Botstein D (2000) Singular value decomposition for genome–wide expression data processing and modelling. Proc Natl Acad Sci 97(18):10101–10106
Anderson RH, Farrar DB, Thoms SR (2009) Application of discriminant analysis with clustered data to determine anthropogenic metals contamination. Sci Total Environ 408(1):50–56
Baker K (2005) Singular value decomposition tutorial. Ohio State University
Barceló-Vidal C, Pawlowsky-Glahn V, Grunsky E (1996) Some aspects of transformations of compositional data and the identification of outliers. Math Geol 28:501–518
Bech J, Poschenrieder C, Llugany M, Barceló J, Tume P, Tobias F, Barranzuela J, Vásquez E (1997) Arsenic and heavy metal contamination of soil and vegetation around a copper mine in Northern Peru. Sci Total Environ 203(1):83–91
Berkhin P (2006) A survey of clustering data mining techniques. Grouping multidimensional data. Springer, Berlin, pp 25–71
Bošnjak MU, Capak K, Jazbec A, Casiot C, Sipos L, Poljak V, Dadić Ž (2012) Hydrochemical characterization of arsenic contaminated alluvial aquifers in Eastern Croatia using multivariate statistical techniques and arsenic risk assessment. Sci Total Environ 420:100–110
Carslaw DC, Beevers SD (2013) Characterising and understanding emission sources using bivariate polar plots and k–means clustering. Environ Model Softw 40:325–329
Cattell RB (1966) The scree test for the number of factors. Multivar Behav Res 1(2):245–276
Clare AP, Cohen DR (2001) A comparison of unsupervised neural networks and k–means clustering in the analysis of multi–element stream sediment data. Geochemistry: exploration. Environ Anal 1:119–134
Cohen DR, Skillicorn DB, Gatehouse SG, Dalrymple IJ (2003) Signature detection in geochemical data using singular value decomposition and semi–discrete decomposition 21st Internat Geochem Explor Symp (IGES)
Cohen DR, Rutherford NF, Morisseau E, Christofiou E, Zissimos AM (2012) Anthropogenic versus lithological influences on soil geochemical patterns in Cyprus. Geochem Explor Environ Anal 12:349–360
Costa M, Gonçalves AM (2011) Clustering and forecasting of dissolved oxygen concentration on a river basin. Stoch Environ Res Risk Assess 25(2):151–163
Dalrymple IJ, Cohen DR, Gatehouse SG (2005) Optimisation of partial extraction chemistry for an acetate leach Geochemistry: exploration. Environ Anal 5:279–285
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B (Methodol):1–38
Dubitzky W (2008) Data mining techniques in grid computing environments. Wiley, Chichester
Edwards PG, Gaines KF, Bryan Jr AL, Novak JM, Blas SA (2014) Trophic dynamics of U, Ni, Hg and other contaminants of potential concern on the Department of Energy’s Savannah River Site. Environ Monitor Assess 186(1):481–500
Everitt B, Landau S, Leese M (2001) Cluster analysis. Hodder Headline Group, London
Everitt B, Landau S, Leese M, Stahl D (2011) Cluster analysis. Wiley Series in Probability and Statistics, Wiley
Filzmoser P, Hron K, Reimann C (2012) Interpretation of multivariate outliers for compositional data. Comput Geosci 39:77–85
Filzmoser P, Ruiz-Gazen A, Thomas-Agnan C (2014) Identification of local multivariate outliers. Stat Pap 55:29–47
Geranian H, Mokhtari AR, Cohen DR (2013) A comparison of fractal methods and probability plots in identifying and mapping soil metal contamination near an active mining area, Iran. Sci Tot Environ 464:845–854
Ghaed Rahmati R, Fathianpour N (2008) Dividing the stone units of Irankuh region the algorithms of classified providing pictures of regional satellite data. J Eng Geol 2:395–412 (in Persian)
Ghazban F, Mcnutt RH, Schwarcz HP (1994) Genesis of sediment–hosted Zn–Pb–Ba deposits in the Irankuh district, Esfahan area, west–central Iran. Econ Geol 89:1262–1278
Hongjin J, Daoming Z, Yanxiang S, Yangang W, Xisheng W (2007) Semi–hierarchical correspondence cluster analysis and regional geochemical pattern recognition. J Geochem Explor 93(2):109–119
Hubert L, Meulman J, Heiser W (2000) Two purposes for matrix factorization: an historical appraisal. SIAM Rev 42(1):68–82
Islam MS, Ahmed MK, Habibullah-Al-Mamun M (2015) Apportionment of heavy metals in soil and vegetables and associated health risks assessment. Stoch Environ Res Risk Assess 30(1):365–377
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323
Kalman D (1996) A singularly valuable decomposition: the SVD of a matrix. College Math Journal. 27(1):1–23
Kaski S (1997) Data exploration using self-organizing maps. Acta Polytechnica Scand 82. Espoo
Kolda TG, O’Leary DP (1998) A semidiscrete matrix decomposition for latent semantic indexing information retrieval. ACM Trans Inf Syst 16(4):322–346
Kolda TG, O’Leary DP (1999a) Latent semantic indexing via a semi-discrete matrix decomposition. In: Cybenko G et al (eds) The mathematics of information coding, extraction and distribution. Springer-Verlag, Berlin
Kolda TG, O’Leary DP (1999b) Computation and uses of the semidiscrete matrix decomposition. Tech Rpt CS–TR–4012, Dept. Computer Science, Univ Maryland
Korre A (1999) Statistical and spatial assessment of soil heavy metal contamination in areas of poorly recorded, complex sources of pollution. Stoch Env Res Risk Assess 13(4):260–287
Krishna AK, Mohan KR, Murthy NN, Periasamy V, Bipinkumar G, Manohar K, Rao SS (2013) Assessment of heavy metal contamination in soils around chromite mining areas, Nuggihalli, Karnataka, India. Environ Earth Sci 70(2):699–708
McConnell S, Skillicorn DB (2001) Outlier detection using semi–discrete decomposition. Technical Report 2001–452, Dept of Computing and Information Science, Queen’s University
McConnell S, Skillicorn DB (2002) Semidiscrete decomposition: A bump hunting technique. Australasian Data Mining Workshop
Meshkani SA, Mehrabi B, Yaghubpur A, Alghalandis YF (2011) The application of geochemical pattern recognition to regional prospecting: a case study of the Sanandaj-Sirjan metallogenic zone, Iran. J Geochem Explor 108(3):183–195
Mokhtari AR, Cohen DR, Gatehouse SG (2009) Geochemical effects of deeply buried Cu–Au mineralization on transported regolith in an arid terrain. Geochemistry: exploration. Environ Anal 9:227–236
Mokhtari AR, Rodsari PR, Cohen DR, Emami A, Bafghi AAD, Ghegeni ZK (2015) Metal speciation in agricultural soils adjacent to the Irankuh Pb–Zn mining area, central Iran. J Afr Earth Sc 101:186–193
Mooi E, Sarstedt M (2011) Cluster analysis. A concise guide to market research. Springer, Berlin, pp 237–284
O’Leary DP, Peleg S (1983) Digital image compression by outer product expansion communications. IEEE Trans 31(3):441–444
Rastad E (1981) Geological, mineralogical and ore facies investigation of the lower cretaceous stratabound Zn–Pb–Ba–Cu deposits of the Irankuh mountain range, Isfahan, west central Iran. PhD thesis, Heidelberg University
Reimann C, Filzmoser P, Garrett RG, Dutter R (2008) Statistical data analysis explained: applied environmental statistics with R. John Wiley Sons, Chichester
Ren L, Cohen DR, Rutherford NF, Zissimos AM, Morisseau E (2015) Reflections of the geological characteristics of Cyprus in soil rare earth element patterns. Appl Geochem 56:80–93
Skillicorn DB (2004) Finding unusual correlation using matrix decompositions. Symposium on intelligence and security informatics. Springer, Tucson, pp 83–99
Skillicorn DB (2007) Understanding complex datasets: data mining with matrix decompositions. CRC Press, Boca Raton
Skillicorn DB, Cohen DR (2004) Detecting mineralisation using partial element extraction; A case study. 4th SIAM international conference on data mining, Florida, April 24, 2004
Stewart GW (1993) On the early history of the singular value decomposition. SIAM Rev 35(4):551–566
Teimoryacl F, Pakzad H, Baghery H (2012) The study of source of metals and mineralization fluids in Irankuh deposit. J Stratigr Sedimentol Res 44(3):83–102 (in Persian)
Templ M, Filzmoser P, Reimann C (2008) Cluster analysis applied to regional geochemical data: problems and possibilities. Appl Geochem 23(8):2198–2213
Wall ME, Rechtsteiner A, Rocha LM (2003) Singular value decomposition and principal component analysis. In: Berrar DP, Dubitzky W, Granzow M (eds) A practical approach to microarray data analysis. Kluwer, Norwell, pp 91–109
Ward JH (1963) Hierarchical grouping to optimize an objective function. JASA 58(301):236–244
Xu R, Wunsch D (2005) Survey of clustering algorithms. Neural Netw IEEE Trans 16(3):645–678
Zumlot T, Batayneh A, Nazal Y, Ghrefat H, Mogren S, Zaman H, Elawadi E, Laboun A, Qaisy S (2013) Using multivariate statistical analyses to evaluate groundwater contamination in the north western part of Saudi Arabia. Environ Earth Sci 70(7):3277–3287
Zyto SA, Grama W, Szpankowski S (2002) Semi-discrete matrix transforms (SDD) for image and video compression. Kluwer, Amsterdam
Acknowledgments
The authors thank to the Iranian Geological Survey for their support and the soil analysis, Islamic Azad University (Bafgh branch) and also Dr. Soleimani (Isfahan University of Technology) for his assistance in the project.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zekri, H., Mokhtari, A.R. & Cohen, D.R. Application of singular value decomposition (SVD) and semi-discrete decomposition (SDD) techniques in clustering of geochemical data: an environmental study in central Iran. Stoch Environ Res Risk Assess 30, 1947–1960 (2016). https://doi.org/10.1007/s00477-016-1219-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00477-016-1219-5