Skip to main content

2021 | OriginalPaper | Buchkapitel

The Information-Geometric Perspective of Compositional Data Analysis

verfasst von : Ionas Erb, Nihat Ay

Erschienen in: Advances in Compositional Data Analysis

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Information geometry uses the formal tools of differential geometry to describe the space of probability distributions as a Riemannian manifold with an additional dual structure. The formal equivalence of compositional data with discrete probability distributions makes it possible to apply the same description to the sample space of Compositional Data Analysis (CoDA). The latter has been formally described as a Euclidean space with an orthonormal basis featuring components that are suitable combinations of the original parts. In contrast to the Euclidean metric, the information-geometric description singles out the Fisher information metric as the only one keeping the manifold’s geometric structure invariant under equivalent representations of the underlying random variables. Well-known concepts that are valid in Euclidean coordinates, e.g., the Pythagorean theorem, are generalized by information geometry to corresponding notions that hold for more general coordinates. In briefly reviewing Euclidean CoDA and, in more detail, the information-geometric approach, we show how the latter justifies the use of distance measures and divergences that so far have received little attention in CoDA as they do not fit the Euclidean geometry favored by current thinking. We also show how Shannon entropy and relative entropy can describe amalgamations in a simple way, while Aitchison distance requires the use of geometric means to obtain more succinct relationships. We proceed to prove the information monotonicity property for Aitchison distance. We close with some thoughts about new directions in CoDA where the rich structure that is provided by information geometry could be exploited.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
The score function plays an important role in maximum-likelihood estimation.
 
2
As an example, the translation invariance of distance measures, known under the name of perturbation invariance in CoDA, has its information-geometric analog in the invariance of the inner product of tangent vectors \(\mathbf {u}\), \(\mathbf {v}\) under parallel transport P and its dual \(P^*\): \(\langle \mathbf {u},\mathbf {v}\rangle =\langle P(\mathbf {u}),P^*(\mathbf {v})\rangle \).
 
3
The Riemannian distance between two points on a manifold is the minimum of the lengths of all the piecewise smooth paths joining the two points.
 
4
This can be seen as the high-temperature limit in statistical physics.
 
5
A divergence is decomposable if it can be written as a sum of terms that only depend on individual components.
 
6
But note that this mapping does not obtain the spherical representative of the composition in the sense of the definition of an equivalence class.
 
7
Subcompositional coherence, i.e., the fundamental requirement that quantities remain identical on a renormalized subcomposition, is not an issue for amalgamation: after amalgamation, there is no need for renormalization.
 
8
It is interesting to note that the two interaction terms have the form of squares of the balance and pivot coordinates mentioned in Sect. 2.
 
9
The product over parts specifies the probability that all events in the subset occur, but this is then re-scaled by the exponent to the probability of a single event.
 
Literatur
Zurück zum Zitat J. Aitchison, The Statistical Analysis of Compositional Data (Chapman and Hall, London, 1986)MATHCrossRef J. Aitchison, The Statistical Analysis of Compositional Data (Chapman and Hall, London, 1986)MATHCrossRef
Zurück zum Zitat J. Aitchison, The statistical analysis of compositional data. J. R. Stat. Soc. B 44(2), 139–160 (1982)MathSciNetMATH J. Aitchison, The statistical analysis of compositional data. J. R. Stat. Soc. B 44(2), 139–160 (1982)MathSciNetMATH
Zurück zum Zitat S. Amari, Differential-Geometric Methods in Statistics. Lecture Notes in Statistics, vol. 28 (Springer, Berlin, 1985) S. Amari, Differential-Geometric Methods in Statistics. Lecture Notes in Statistics, vol. 28 (Springer, Berlin, 1985)
Zurück zum Zitat S. Amari, Information Geometry and Its Applications. Applied Mathematical Sciences, vol. 194 (Springer, Berlin, 2016) S. Amari, Information Geometry and Its Applications. Applied Mathematical Sciences, vol. 194 (Springer, Berlin, 2016)
Zurück zum Zitat S. Amari, H. Nagaoka, Methods of Information Geometry. Translations of Mathematical Monographs, vol. 191 (American Mathematical Society, Providence, 2000) S. Amari, H. Nagaoka, Methods of Information Geometry. Translations of Mathematical Monographs, vol. 191 (American Mathematical Society, Providence, 2000)
Zurück zum Zitat N. Ay, J. Jost, H.V. Le, L. Schwachhöfer, Information Geometry. A Series of Modern Surveys in Mathematics, vol. 64 (Springer, Berlin, 2017) N. Ay, J. Jost, H.V. Le, L. Schwachhöfer, Information Geometry. A Series of Modern Surveys in Mathematics, vol. 64 (Springer, Berlin, 2017)
Zurück zum Zitat C. Barceló-Vidal, J.A. Martín-Fernández, G. Mateu-Figueras, Compositional differential calculus on the simplex, in Compositional Data Analysis: Theory and Applications, ed. by V. Pawlowsky-Glahn, A. Buccianti (Wiley, New York, 2011), pp. 176–190 C. Barceló-Vidal, J.A. Martín-Fernández, G. Mateu-Figueras, Compositional differential calculus on the simplex, in Compositional Data Analysis: Theory and Applications, ed. by V. Pawlowsky-Glahn, A. Buccianti (Wiley, New York, 2011), pp. 176–190
Zurück zum Zitat C. Barceló-Vidal, J.A. Martín-Fernández, The mathematics of compositional analysis. Austrian J. Stat. 45(4), 57–71 (2016)CrossRef C. Barceló-Vidal, J.A. Martín-Fernández, The mathematics of compositional analysis. Austrian J. Stat. 45(4), 57–71 (2016)CrossRef
Zurück zum Zitat A. Bhattacharyya, On a measure of divergence between two statistical populations defined by their probability distributions. Bull. Calcutta Math. Soc. 35, 99–109 (1943)MathSciNetMATH A. Bhattacharyya, On a measure of divergence between two statistical populations defined by their probability distributions. Bull. Calcutta Math. Soc. 35, 99–109 (1943)MathSciNetMATH
Zurück zum Zitat N. Chentsov, Algebraic foundation of mathematical statistics. Math. Oper. Forsch. Stat., Ser. Stat. 9, 267–276 (1978) N. Chentsov, Algebraic foundation of mathematical statistics. Math. Oper. Forsch. Stat., Ser. Stat. 9, 267–276 (1978)
Zurück zum Zitat N. Chentsov, Statistical Decision Rules and Optimal Inference, vol. 53, Nauka (1972) (in Russian); English translation in: Monographs in Mathematics, vol. 53 (American Mathematical Society, Providence, 1982) N. Chentsov, Statistical Decision Rules and Optimal Inference, vol. 53, Nauka (1972) (in Russian); English translation in: Monographs in Mathematics, vol. 53 (American Mathematical Society, Providence, 1982)
Zurück zum Zitat J.J. Egozcue, C. Barceló-Vidal, J.A. Martín-Fernández, E. Jarauta-Bragulat, J.L. Díaz-Barrero, G. Mateu-Figueras, V. Pawlowsky-Glahn, A. Buccianti, Elements of simplicial linear algebra and geometry, in Compositional Data Analysis: Theory and Applications, ed. by V. Pawlowsky-Glahn, A. Buccianti (Wiley, New York, 2011), pp. 141–157 J.J. Egozcue, C. Barceló-Vidal, J.A. Martín-Fernández, E. Jarauta-Bragulat, J.L. Díaz-Barrero, G. Mateu-Figueras, V. Pawlowsky-Glahn, A. Buccianti, Elements of simplicial linear algebra and geometry, in Compositional Data Analysis: Theory and Applications, ed. by V. Pawlowsky-Glahn, A. Buccianti (Wiley, New York, 2011), pp. 141–157
Zurück zum Zitat J.J. Egozcue, V. Pawlowsky-Glahn, R. Tolosana-Delgado, M.I. Ortego, K.G. van den Boogaart, Bayes spaces: use of improper distributions and exponential families. Revista de la Real Academia de Ciencias Exactas, Físicas y Naturales. Serie A. Matematicas, 107(2), 475–486 (2013) J.J. Egozcue, V. Pawlowsky-Glahn, R. Tolosana-Delgado, M.I. Ortego, K.G. van den Boogaart, Bayes spaces: use of improper distributions and exponential families. Revista de la Real Academia de Ciencias Exactas, Físicas y Naturales. Serie A. Matematicas, 107(2), 475–486 (2013)
Zurück zum Zitat J.J. Egozcue, V. Pawlowsky-Glahn, Groups of parts and their balances in compositional data analysis. Math. Geol. 37(7), 795–828 (2005)MathSciNetMATHCrossRef J.J. Egozcue, V. Pawlowsky-Glahn, Groups of parts and their balances in compositional data analysis. Math. Geol. 37(7), 795–828 (2005)MathSciNetMATHCrossRef
Zurück zum Zitat J.J. Egozcue, V. Pawlowsky-Glahn, Evidence functions: a compositional approach to information. SORT 42(2), 101–124 (2018)MathSciNetMATH J.J. Egozcue, V. Pawlowsky-Glahn, Evidence functions: a compositional approach to information. SORT 42(2), 101–124 (2018)MathSciNetMATH
Zurück zum Zitat J.J. Egozcue, V. Pawlowsky-Glahn, G. Mateu-Figueras, C. Barceló-Vidal, Isometric Logratio transformations for compositional data analysis. Math. Geol. 35(3), 279–300 (2003)MathSciNetMATHCrossRef J.J. Egozcue, V. Pawlowsky-Glahn, G. Mateu-Figueras, C. Barceló-Vidal, Isometric Logratio transformations for compositional data analysis. Math. Geol. 35(3), 279–300 (2003)MathSciNetMATHCrossRef
Zurück zum Zitat M. Greenacre, Compositional data analysis. Annu. Rev. Stat. Appl. 8(1), 271–299 (2021) M. Greenacre, Compositional data analysis. Annu. Rev. Stat. Appl. 8(1), 271–299 (2021)
Zurück zum Zitat M. Greenacre, Log-ratio analysis is a limiting case of correspondence analysis. Math. Geosci. 42, 129 (2010)CrossRef M. Greenacre, Log-ratio analysis is a limiting case of correspondence analysis. Math. Geosci. 42, 129 (2010)CrossRef
Zurück zum Zitat M. Greenacre, Amalgamations are valid in compositional data analysis, can be used in agglomerative clustering, and their logratios have an inverse transformation. Appl. Comp. Geosc. 5, 100017 (2020)CrossRef M. Greenacre, Amalgamations are valid in compositional data analysis, can be used in agglomerative clustering, and their logratios have an inverse transformation. Appl. Comp. Geosc. 5, 100017 (2020)CrossRef
Zurück zum Zitat K. Hron, P. Filzmoser, P. de Caritat, E. Fi\(\hat{\text{s}}\)erová, A. Gardlo, Weighted pivot coordinates for compositional data and their application to geochemical mapping. Math. Geosci. 49, 797–814 (2017) K. Hron, P. Filzmoser, P. de Caritat, E. Fi\(\hat{\text{s}}\)erová, A. Gardlo, Weighted pivot coordinates for compositional data and their application to geochemical mapping. Math. Geosci. 49, 797–814 (2017)
Zurück zum Zitat J.A. Martín-Fernández, M. Bren, C. Barceló-Vidal, V. Pawlowsky-Glahn, A measure of difference for compositional data based on measures of divergence, in Proceedings of the Fifth Annual Conference of the International Association for Mathematical Geology, Trondheim (Norway), vol. 1, ed. by S.J. Lippard, A. Naess, R. Sinding-Larsen (1999), pp. 211–215 J.A. Martín-Fernández, M. Bren, C. Barceló-Vidal, V. Pawlowsky-Glahn, A measure of difference for compositional data based on measures of divergence, in Proceedings of the Fifth Annual Conference of the International Association for Mathematical Geology, Trondheim (Norway), vol. 1, ed. by S.J. Lippard, A. Naess, R. Sinding-Larsen (1999), pp. 211–215
Zurück zum Zitat J. Palarea-Albaladejo, J.A. Martín-Fernández, J.A. Soto, Dealing with distances and transformations for fuzzy c-means clustering of compositional data. J. Classif. 29(2), 144–169 (2012) J. Palarea-Albaladejo, J.A. Martín-Fernández, J.A. Soto, Dealing with distances and transformations for fuzzy c-means clustering of compositional data. J. Classif. 29(2), 144–169 (2012)
Zurück zum Zitat T.P. Quinn, I. Erb, Amalgams: data-driven amalgamation for the dimensionality reduction of compositional data. NAR genom. bioinform. 2(4) lqaa076 (2021) T.P. Quinn, I. Erb, Amalgams: data-driven amalgamation for the dimensionality reduction of compositional data. NAR genom. bioinform. 2(4) lqaa076 (2021)
Zurück zum Zitat C.R. Rao, Information and the accuracy attainable in the estimation of statistical parameters. Bull. Calcutta Math. Soc. 37, 81–89 (1945)MathSciNetMATH C.R. Rao, Information and the accuracy attainable in the estimation of statistical parameters. Bull. Calcutta Math. Soc. 37, 81–89 (1945)MathSciNetMATH
Zurück zum Zitat A. Rényi, On measures of entropy and information, in Proceedings of the 4th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1 (University of California Press, Berkeley, 1961), pp. 547–561 A. Rényi, On measures of entropy and information, in Proceedings of the 4th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1 (University of California Press, Berkeley, 1961), pp. 547–561
Zurück zum Zitat J. Whittaker, Graphical Models in Applied Multivariate Statistics (Wiley, New York, 1990)MATH J. Whittaker, Graphical Models in Applied Multivariate Statistics (Wiley, New York, 1990)MATH
Metadaten
Titel
The Information-Geometric Perspective of Compositional Data Analysis
verfasst von
Ionas Erb
Nihat Ay
Copyright-Jahr
2021
DOI
https://doi.org/10.1007/978-3-030-71175-7_2