Principal component analysis versus fuzzy principal component analysis: A case study: the quality of danube water (1985–1996)
Introduction
Multivariate statistical methods for the analysis of large quantities of data have been applied to chemical and environmental systems during the last decades [1], [2], [3], [4]. One of these methods, principal component analysis (PCA) showed special promise for furnishing new and unique insights into the interactions in a wide range of pollution and ecotoxicological situations [5], [6], [7], [8], [9], [10], [11].
PCA is designed to transform the original variables into new, uncorrelated variables (axes) called the principal components, that are linear combinations of the original variables. The new axes lie along the directions of maximum variance. PCA provides an objective way of finding indices of this type so that the variation in the data can be accounted for as concisely as possible.
Principal component analysis as with any other multivariate statistical method is sensitive to outliers, missing data, and poor linear correlation between variables due to poorly distributed variables. As a result, the classical principal components may describe the shape of the majority of data incorrectly. It is therefore necessary to apply robust methods that are resistant to possible outliers [12]. In this order, during the last decades, two robust approaches have been developed. The first is based on the eigenvectors of a robust covariance matrix such as the MCD-estimator [13] or S-estimators of location and shape [14], [15], and is limited to relatively low-dimensional data. The second approach is based on projection pursuit and can handle high-dimensional data [16]. A robust PCA approach which combines projection pursuit ideas with robust estimation of low-dimensional data has been developed [17] and applied to several bio-chemical datasets [18].
However, one of the most promising approaches to “robustify” PCA has been appearing to be the fuzzification of the matrix data to diminish the influence of outliers [19], [20], [21], [22], [23], [24], [25], [26].
In this paper we discuss and apply a robust fuzzy PCA algorithm (FPCA) [27]. The efficiency of the new algorithm is illustrated on a data set concerning the water quality of the Danube River for a period of 11 consecutive years.
Section snippets
Classical principal component analysis
Principal component analysis is also known as eigenvector analysis, eigenvector decomposition or Karhunen–Loéve expansion. As we have already mentioned above, the main purpose of PCA is to represent in an economic way the location of the samples in a reduced coordinate system where instead of m-axes (corresponding to m characteristics) only p (p < m) can usually be used to describe the data set with maximum possible information.
Principal component analysis practically transforms the original
Results and discussion
The data collection was performed at Galaţi site, Romania, according to standardized methods for sampling, sample preparation and analysis of Danube River water for a period of 11 consecutive years [30]. Galaţi site is selected as representative for the Danube estuary region.
Nineteen different water parameters were checked monthly (pH, chemical oxygen demand-COD, equivalent oxygen, calcium, magnesium, calcium/magnesium ratio, chloride, sulphate, hydrogen carbonate, nitrite, nitrate, phosphate,
Conclusions
A fuzzy principal component analysis method for robust estimation of principal components has been applied in this paper. The efficiency of the new algorithm was illustrated on a data set concerning the quality of the Danube River. The FPCA method achieved better results mainly because it is more compressible than classical PCA, i.e. the first fuzzy principal component accounts for significantly more of the variance than their classical counterparts. Considering, for example, a two component
References (30)
Chemom. Intell. Lab. Syst.
(1987)Chemom. Intell. Lab. Syst.
(1987)- et al.
Comput. Geosci.
(1993) - et al.
Anal. Chim. Acta
(2001) - et al.
Chemom. Intell. Lab. Syst.
(2002) Chemom. Intell. Lab. Syst.
(2002)- et al.
Chemom. Intell. Lab. Syst.
(2002) Inf. Control
(1965)- et al.
Comput. Geosci.
(1984) - et al.
Chemosphere
(2000)
Pattern Recognition Lett.
Talanta
Chemometrics: A textbook
Chemometrics: Applications of Mathematics and Statistics to the Laboratory
Chemometrics in Environmental Analysis
Cited by (171)
Structural damage quantification using long short-term memory (LSTM) auto-encoder and impulse response functions
2024, Journal of Infrastructure Intelligence and ResilienceAssessing the quality and heavy metal contamination of soil in tea gardens around Magurchara gas blowout in Bangladesh using multivariate and soil quality index methods
2022, Journal of Hazardous Materials Advances