Skip to main content
Top
Published in: Advances in Data Analysis and Classification 1/2015

01-03-2015 | Regular Article

Principal component analysis for probabilistic symbolic data: a more generic and accurate algorithm

Authors: Meiling Chen, Huiwen Wang, Zhongfeng Qin

Published in: Advances in Data Analysis and Classification | Issue 1/2015

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In the symbolic data framework, probabilistic symbolic data are considered as those whose components are random variables with general probability distributions. Intervals (or uniform distributions), histograms (or empirical distributions), Gaussian distribution and Chi-squared distribution are all the special cases of them. The existing approaches devoted to the subject have a common shortcoming since they can not obtain the distributions of linear combinations (i.e., principal components) of random variables especially for not identically distributed ones. This paper will overcome the shortcoming by providing an exact probability density function for each principal component by using the inversion theorem. Further, the paper defines a covariance matrix for probabilistic symbolic data and presents a new principal component analysis based on this variance–covariance structure. The effectiveness of the proposed method is illustrated by a simulated numerical experiment, and two real-life cases including clustering of oils and fats data, and evaluation of indexed journals of Science Citation Index.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Aitchison J (1982) The statistical analysis of compositional data. J R Stat Soc Ser B (Methodological) 44(2):139–177 Aitchison J (1982) The statistical analysis of compositional data. J R Stat Soc Ser B (Methodological) 44(2):139–177
go back to reference Billard L, Diday E (2003) From the statistics of data to the statistics of knowledge: symbolic data analysis. J Am Stat Assoc 98(462):470–487CrossRefMathSciNet Billard L, Diday E (2003) From the statistics of data to the statistics of knowledge: symbolic data analysis. J Am Stat Assoc 98(462):470–487CrossRefMathSciNet
go back to reference Billard L, Diday E (2006) Symbolic data analysis: conceptual statistics and data mining. Wiley, ChichesterCrossRef Billard L, Diday E (2006) Symbolic data analysis: conceptual statistics and data mining. Wiley, ChichesterCrossRef
go back to reference Bock HH, Diday E (2000) Analysis of symbolic data: exploratory methods for extracting statistical information from complex data. Springer, New YorkCrossRef Bock HH, Diday E (2000) Analysis of symbolic data: exploratory methods for extracting statistical information from complex data. Springer, New YorkCrossRef
go back to reference Cazes P (2002) Analyse factorielle d’un tableau de lois de probabilité. Revue de Statistique Appliquée 50(3):5–24MathSciNet Cazes P (2002) Analyse factorielle d’un tableau de lois de probabilité. Revue de Statistique Appliquée 50(3):5–24MathSciNet
go back to reference Cazes P, Chouakria A, Diday E, Schektrman Y (1997) Entension de l’analyse en composantes principales à des données de type intervalle. Revue de Statistique Appliquée 45(3):5–24 Cazes P, Chouakria A, Diday E, Schektrman Y (1997) Entension de l’analyse en composantes principales à des données de type intervalle. Revue de Statistique Appliquée 45(3):5–24
go back to reference Cazes P, Chouakria A, Diday E (2000) Symbolic principal components analysis. In: Bock HH, Diday E (eds) Analysis of symbolic data. Springer, New York, pp 200–212 Cazes P, Chouakria A, Diday E (2000) Symbolic principal components analysis. In: Bock HH, Diday E (eds) Analysis of symbolic data. Springer, New York, pp 200–212
go back to reference Chouakria A, Diday E, Cazes P (1998) Vertices principal components analysis with an improved factorial representation. In: Rizzi A, Vichi M, Bock HH (eds) Advances in data science and classification. Springer, Berlin, Heidelberg, pp 397–402CrossRef Chouakria A, Diday E, Cazes P (1998) Vertices principal components analysis with an improved factorial representation. In: Rizzi A, Vichi M, Bock HH (eds) Advances in data science and classification. Springer, Berlin, Heidelberg, pp 397–402CrossRef
go back to reference Diday E (1987) The symbolic approach in clustering and relating methods of data analysis: the basic choices. In: Conference of the International Federation of Classification Societies, pp 673–684 Diday E (1987) The symbolic approach in clustering and relating methods of data analysis: the basic choices. In: Conference of the International Federation of Classification Societies, pp 673–684
go back to reference Diday E (1995) Probabilist, possibilist and belief objects for knowledge analysis. Ann Oper Res 55(2):225–276CrossRef Diday E (1995) Probabilist, possibilist and belief objects for knowledge analysis. Ann Oper Res 55(2):225–276CrossRef
go back to reference Diday E, Noirhomme-Fraiture M (2008) Symbolic data analysis and the SODAS software. Wiley-Interscience, ChichesterMATH Diday E, Noirhomme-Fraiture M (2008) Symbolic data analysis and the SODAS software. Wiley-Interscience, ChichesterMATH
go back to reference Diday E, Vrac M (2005) Mixture decomposition of distributions by copulas in the symbolic data analysis framework. Discrete Appl Math 147(1):27–41CrossRefMATHMathSciNet Diday E, Vrac M (2005) Mixture decomposition of distributions by copulas in the symbolic data analysis framework. Discrete Appl Math 147(1):27–41CrossRefMATHMathSciNet
go back to reference Douzal-Chouakria A, Billard L, Diday E (2011) Principal component analysis for interval-valued observations. Stat Anal Data Min 4(2):229–246CrossRefMathSciNet Douzal-Chouakria A, Billard L, Diday E (2011) Principal component analysis for interval-valued observations. Stat Anal Data Min 4(2):229–246CrossRefMathSciNet
go back to reference D’Urso P, Giordani P (2004) A least squares approach to principal component analysis for interval valued data. Chemometr Intell Lab Syst 70(2):179–192CrossRef D’Urso P, Giordani P (2004) A least squares approach to principal component analysis for interval valued data. Chemometr Intell Lab Syst 70(2):179–192CrossRef
go back to reference Ferraty F, Vieu P (2006) Nonparametric functional data analysis: theory and practice. Springer, New York Ferraty F, Vieu P (2006) Nonparametric functional data analysis: theory and practice. Springer, New York
go back to reference Ichino M, Yaguchi H (1994) Generalized Minkowski metrics for mixed feature-type data analysis. Syst Man Cybernet IEEE Trans 24(4):698–708CrossRefMathSciNet Ichino M, Yaguchi H (1994) Generalized Minkowski metrics for mixed feature-type data analysis. Syst Man Cybernet IEEE Trans 24(4):698–708CrossRefMathSciNet
go back to reference Irpino A, Verde R (2011) Basic statistics for probabilistic symbolic variables: a novel metric-based approach. arXiv:1110.2295 [statME] Irpino A, Verde R (2011) Basic statistics for probabilistic symbolic variables: a novel metric-based approach. arXiv:​1110.​2295 [statME]
go back to reference Lauro CN, Verde R, Palumbo F (2000) Factorial data analysis on symbolic objects under cohesion constrains. Springer, Berlin Lauro CN, Verde R, Palumbo F (2000) Factorial data analysis on symbolic objects under cohesion constrains. Springer, Berlin
go back to reference Le-Rademacher J, Billard L (2012) Symbolic covariance principal component analysis and visualization for interval-valued data. J Comput Graph Stat 21(2):413–432CrossRefMathSciNet Le-Rademacher J, Billard L (2012) Symbolic covariance principal component analysis and visualization for interval-valued data. J Comput Graph Stat 21(2):413–432CrossRefMathSciNet
go back to reference Malerba D, Esposito F, Monopoli M (2002) Comparing dissimilarity measures for probabilistic symbolic objects. Data Min III Ser Manag Inf Syst 6:31–40 Malerba D, Esposito F, Monopoli M (2002) Comparing dissimilarity measures for probabilistic symbolic objects. Data Min III Ser Manag Inf Syst 6:31–40
go back to reference Nagabhushan P, Kumar RP (2007) Histogram PCA. In: Liu D et al (eds) Advances in Neural Networks—ISNN 2007, vol 4492. Springer, Berlin, Heidelberg, pp 1012–1021 Nagabhushan P, Kumar RP (2007) Histogram PCA. In: Liu D et al (eds) Advances in Neural Networks—ISNN 2007, vol 4492. Springer, Berlin, Heidelberg, pp 1012–1021
go back to reference Nagabhushan P, Chidananda Gowda K, Diday E (1995) Dimensionality reduction of symbolic data. Pattern Recogn Lett 16(2):219–223 Nagabhushan P, Chidananda Gowda K, Diday E (1995) Dimensionality reduction of symbolic data. Pattern Recogn Lett 16(2):219–223
go back to reference Palumbo F, Lauro CN (2003) A PCA for interval-valued data based on midpoints and radii. In: Yanai H et al (eds) New developments in psychometric. Springer, Tokyo, pp 641–648 Palumbo F, Lauro CN (2003) A PCA for interval-valued data based on midpoints and radii. In: Yanai H et al (eds) New developments in psychometric. Springer, Tokyo, pp 641–648
go back to reference Pawlowsky-Glahn V, Buccianti A (2011) Compositional data analysis: theory and applications. Wiley, ChichesterCrossRef Pawlowsky-Glahn V, Buccianti A (2011) Compositional data analysis: theory and applications. Wiley, ChichesterCrossRef
go back to reference Ramsay J (2005) Functional data analysis. Springer, New York Ramsay J (2005) Functional data analysis. Springer, New York
go back to reference Rodrıguez O, Diday E, Winsberg S (2000) Generalization of the principal components analysis to histogram data. In: Workshop on simbolic data analysis of the 4th European Conference on principles and practice of knowledge discovery in data bases, Setiembre, pp 12–16 Rodrıguez O, Diday E, Winsberg S (2000) Generalization of the principal components analysis to histogram data. In: Workshop on simbolic data analysis of the 4th European Conference on principles and practice of knowledge discovery in data bases, Setiembre, pp 12–16
go back to reference Verde R, Irpino A (2009) New statistics for new data: a proposal for comparing multivalued numerical data. Stat Appl 21(2):185–206 Verde R, Irpino A (2009) New statistics for new data: a proposal for comparing multivalued numerical data. Stat Appl 21(2):185–206
go back to reference Wang H, Guan R, Wu J (2012) Cipca: complete-information-based principal component analysis for interval-valued data. Neurocomputing 86:158–169CrossRef Wang H, Guan R, Wu J (2012) Cipca: complete-information-based principal component analysis for interval-valued data. Neurocomputing 86:158–169CrossRef
Metadata
Title
Principal component analysis for probabilistic symbolic data: a more generic and accurate algorithm
Authors
Meiling Chen
Huiwen Wang
Zhongfeng Qin
Publication date
01-03-2015
Publisher
Springer Berlin Heidelberg
Published in
Advances in Data Analysis and Classification / Issue 1/2015
Print ISSN: 1862-5347
Electronic ISSN: 1862-5355
DOI
https://doi.org/10.1007/s11634-014-0178-2

Other articles of this Issue 1/2015

Advances in Data Analysis and Classification 1/2015 Go to the issue

Premium Partner