Skip to main content
Erschienen in: Advances in Data Analysis and Classification 3/2022

03.07.2021 | Regular Article

Association measures for interval variables

verfasst von: M. Rosário Oliveira, Margarida Azeitona, António Pacheco, Rui Valadas

Erschienen in: Advances in Data Analysis and Classification | Ausgabe 3/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Symbolic Data Analysis (SDA) is a relatively new field of statistics that extends conventional data analysis by taking into account intrinsic data variability and structure. Unlike conventional data analysis, in SDA the features characterizing the data can be multi-valued, such as intervals or histograms. SDA has been mainly approached from a sampling perspective. In this work, we propose a model that links the micro-data and macro-data of interval-valued symbolic variables, which takes a populational perspective. Using this model, we derive the micro-data assumptions underlying the various definitions of symbolic covariance matrices proposed in the literature, and show that these assumptions can be too restrictive, raising applicability concerns. We analyze the various definitions using worked examples and four datasets. Our results show that the existence/absence of correlations in the macro-data may not be correctly captured by the definitions of symbolic covariance matrices and that, in real data, there can be a strong divergence between these definitions. Thus, in order to select the most appropriate definition, one must have some knowledge about the micro-data structure.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Anderson TW (2011) Anderson–Darling tests of goodness-of-fit. In: Lovric M (ed) International encyclopedia of statistical science. Springer, Berlin, pp 52–54CrossRef Anderson TW (2011) Anderson–Darling tests of goodness-of-fit. In: Lovric M (ed) International encyclopedia of statistical science. Springer, Berlin, pp 52–54CrossRef
Zurück zum Zitat Bertrand P, Goupil F (2000) Descriptive statistics for symbolic data. In: Bock HH, Diday E (eds) Analysis of symbolic data, studies in classification, data analysis, and knowledge organization. Springer, Berlin, pp 106–124MATH Bertrand P, Goupil F (2000) Descriptive statistics for symbolic data. In: Bock HH, Diday E (eds) Analysis of symbolic data, studies in classification, data analysis, and knowledge organization. Springer, Berlin, pp 106–124MATH
Zurück zum Zitat Billard L (2008) Sample covariance functions for complex quantitative data. In: Proceedings of World IASC conference, Yokohama, Japan, pp 157–163 Billard L (2008) Sample covariance functions for complex quantitative data. In: Proceedings of World IASC conference, Yokohama, Japan, pp 157–163
Zurück zum Zitat Billard L, Diday E (2003) From the statistics of data to the statistics of knowledge: symbolic data analysis. J Am Stat Assoc 98:470–487MathSciNetCrossRef Billard L, Diday E (2003) From the statistics of data to the statistics of knowledge: symbolic data analysis. J Am Stat Assoc 98:470–487MathSciNetCrossRef
Zurück zum Zitat Billard L, Diday E (2006) Symbolic data analysis: conceptual statistics and data mining. Wiley, HobokenCrossRef Billard L, Diday E (2006) Symbolic data analysis: conceptual statistics and data mining. Wiley, HobokenCrossRef
Zurück zum Zitat Bock HH, Diday E (2000) Analysis of symbolic data: exploratory methods for extracting statistical information from complex data. Springer, New YorkCrossRef Bock HH, Diday E (2000) Analysis of symbolic data: exploratory methods for extracting statistical information from complex data. Springer, New YorkCrossRef
Zurück zum Zitat Brito P (2014) Symbolic data analysis: another look at the interaction of data mining and statistics. Wiley Interdiscip Rev Data Min Knowl Discov 4(4):281–295CrossRef Brito P (2014) Symbolic data analysis: another look at the interaction of data mining and statistics. Wiley Interdiscip Rev Data Min Knowl Discov 4(4):281–295CrossRef
Zurück zum Zitat Brito P, Duarte Silva AP (2012) Modelling interval data with normal and skew-normal distributions. J Appl Stat 39(1):3–20MathSciNetCrossRef Brito P, Duarte Silva AP (2012) Modelling interval data with normal and skew-normal distributions. J Appl Stat 39(1):3–20MathSciNetCrossRef
Zurück zum Zitat Cazes P, Chouakria A, Diday E, Schektman Y (1997) Extension de l’analyse en composantes principales à des données de type intervalle. Revue de Statistique Appliquée 45(3):5–24 Cazes P, Chouakria A, Diday E, Schektman Y (1997) Extension de l’analyse en composantes principales à des données de type intervalle. Revue de Statistique Appliquée 45(3):5–24
Zurück zum Zitat Chouakria A (1998) Extension des méthodes d’analyse factorielle à des données de type intervalle. Ph.D. thesis, Université Paris-Dauphine Chouakria A (1998) Extension des méthodes d’analyse factorielle à des données de type intervalle. Ph.D. thesis, Université Paris-Dauphine
Zurück zum Zitat de Carvalho FAT, Lechevallier Y (2009) Partitional clustering algorithms for symbolic interval data based on single adaptive distances. Pattern Recogn 42(7):1223–1236CrossRef de Carvalho FAT, Lechevallier Y (2009) Partitional clustering algorithms for symbolic interval data based on single adaptive distances. Pattern Recogn 42(7):1223–1236CrossRef
Zurück zum Zitat de Carvalho FAT, Brito P, Bock HH (2006) Dynamic clustering for interval data based on L2 distance. Comput Stat 21(2):231–250CrossRef de Carvalho FAT, Brito P, Bock HH (2006) Dynamic clustering for interval data based on L2 distance. Comput Stat 21(2):231–250CrossRef
Zurück zum Zitat Dias S, Brito P (2017) Off the beaten track: a new linear model for interval data. Eur J Oper Res 258(3):1118–1130MathSciNetCrossRef Dias S, Brito P (2017) Off the beaten track: a new linear model for interval data. Eur J Oper Res 258(3):1118–1130MathSciNetCrossRef
Zurück zum Zitat Diday E (1987) The symbolic approach in clustering and related methods of Data Analysis. In: Bock H (ed) Proceedings of first conference IFCS, Aachen, Germany. North-Holland Diday E (1987) The symbolic approach in clustering and related methods of Data Analysis. In: Bock H (ed) Proceedings of first conference IFCS, Aachen, Germany. North-Holland
Zurück zum Zitat Duarte Silva AP, Brito P (2015) Discriminant analysis of interval data: an assessment of parametric and distance-based approaches. J Classif 32(3):516–541MathSciNetCrossRef Duarte Silva AP, Brito P (2015) Discriminant analysis of interval data: an assessment of parametric and distance-based approaches. J Classif 32(3):516–541MathSciNetCrossRef
Zurück zum Zitat Duarte Silva AP, Filzmoser P, Brito P (2018) Outlier detection in interval data. J Adv Data Anal Classif 12(3):785–822MathSciNetCrossRef Duarte Silva AP, Filzmoser P, Brito P (2018) Outlier detection in interval data. J Adv Data Anal Classif 12(3):785–822MathSciNetCrossRef
Zurück zum Zitat Filzmoser P, Brito P, Duarte Silva AP (2014) Outlier detection in interval data. In: Gilli M, Gonzalez-Rodriguez G, Nieto-Reyes A (eds) Proceedings of COMPSTAT 2014, p 11 Filzmoser P, Brito P, Duarte Silva AP (2014) Outlier detection in interval data. In: Gilli M, Gonzalez-Rodriguez G, Nieto-Reyes A (eds) Proceedings of COMPSTAT 2014, p 11
Zurück zum Zitat Fox J, Weisberg S (2011) An R companion to applied regression, 2nd edn. Sage, Thousand Oaks Fox J, Weisberg S (2011) An R companion to applied regression, 2nd edn. Sage, Thousand Oaks
Zurück zum Zitat Johnson RA, Wichern DW (2007) Applied multivariate statistical analysis. Prentice-Hall Inc, Upper Saddle RiverMATH Johnson RA, Wichern DW (2007) Applied multivariate statistical analysis. Prentice-Hall Inc, Upper Saddle RiverMATH
Zurück zum Zitat Le-Rademacher J (2008) Principal component analysis for interval-valued and histogram-valued data and likelihood functions and some maximum likelihood estimators for symbolic data. Ph.D. thesis, University of Georgia, Athens, GA Le-Rademacher J (2008) Principal component analysis for interval-valued and histogram-valued data and likelihood functions and some maximum likelihood estimators for symbolic data. Ph.D. thesis, University of Georgia, Athens, GA
Zurück zum Zitat Le-Rademacher J, Billard L (2011) Likelihood functions and some maximum likelihood estimators for symbolic data. J Stat Plan Inference 141(4):1593–1602MathSciNetCrossRef Le-Rademacher J, Billard L (2011) Likelihood functions and some maximum likelihood estimators for symbolic data. J Stat Plan Inference 141(4):1593–1602MathSciNetCrossRef
Zurück zum Zitat Le-Rademacher J, Billard L (2012) Symbolic covariance principal component analysis and visualization for interval-valued data. Comput Graph Stat 21(2):413–432MathSciNetCrossRef Le-Rademacher J, Billard L (2012) Symbolic covariance principal component analysis and visualization for interval-valued data. Comput Graph Stat 21(2):413–432MathSciNetCrossRef
Zurück zum Zitat Lima Neto EA, Cordeiro GM, de Carvalho FA (2011) Bivariate symbolic regression models for interval-valued variables. J Stat Comput Simul 81(11):1727–1744MathSciNetCrossRef Lima Neto EA, Cordeiro GM, de Carvalho FA (2011) Bivariate symbolic regression models for interval-valued variables. J Stat Comput Simul 81(11):1727–1744MathSciNetCrossRef
Zurück zum Zitat Maia ALS, de Carvalho FAT, Ludermir TB (2008) Forecasting models for interval-valued time series. Neurocomputing 71(16–18):3344–3352CrossRef Maia ALS, de Carvalho FAT, Ludermir TB (2008) Forecasting models for interval-valued time series. Neurocomputing 71(16–18):3344–3352CrossRef
Zurück zum Zitat Noirhomme-Fraiture M, Brito P (2011) Far beyond the classical data models: symbolic data analysis. Stat Anal Data Min ASA Data Sci J 4(2):157–170MathSciNetCrossRef Noirhomme-Fraiture M, Brito P (2011) Far beyond the classical data models: symbolic data analysis. Stat Anal Data Min ASA Data Sci J 4(2):157–170MathSciNetCrossRef
Zurück zum Zitat Oliveira MR, Vilela M, Pacheco A, Valadas R, Salvador P (2017) Extracting information from interval data using symbolic principal component analysis. Aust J Stat 46:79–87CrossRef Oliveira MR, Vilela M, Pacheco A, Valadas R, Salvador P (2017) Extracting information from interval data using symbolic principal component analysis. Aust J Stat 46:79–87CrossRef
Zurück zum Zitat Queiroz DCF, de Souza RMCR, Cysneiros FJA, Araújo MC (2018) Kernelized inner product-based discriminant analysis for interval data. Pattern Anal Appl 21(3):731–740MathSciNetCrossRef Queiroz DCF, de Souza RMCR, Cysneiros FJA, Araújo MC (2018) Kernelized inner product-based discriminant analysis for interval data. Pattern Anal Appl 21(3):731–740MathSciNetCrossRef
Zurück zum Zitat R Core Team: R (2015) A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria R Core Team: R (2015) A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria
Zurück zum Zitat Rahman PA, Beranger B, Roughan M, Sisson SA (2020) Likelihood-based inference for modelling packet transit from thinned flow summaries. arXiv:2008.13424 Rahman PA, Beranger B, Roughan M, Sisson SA (2020) Likelihood-based inference for modelling packet transit from thinned flow summaries. arXiv:​2008.​13424
Zurück zum Zitat Salvador P, Nogueira A (2014) Customer-side detection of Internet-scale traffic redirection. In: 16th international telecommunications network strategy and planning symposium (Networks 2014), pp 1–5 Salvador P, Nogueira A (2014) Customer-side detection of Internet-scale traffic redirection. In: 16th international telecommunications network strategy and planning symposium (Networks 2014), pp 1–5
Zurück zum Zitat Sato-Ilic M (2011) Symbolic clustering with interval-valued data. Procedia Comput Sci 6:358–363CrossRef Sato-Ilic M (2011) Symbolic clustering with interval-valued data. Procedia Comput Sci 6:358–363CrossRef
Zurück zum Zitat Subtil A (2020) Latent class models in the evaluation of biomedical diagnostic tests and internet traffic anomaly detection. Doctoral’s thesis, Instituto Superior Técnico, Universidade de Lisboa, Portugal Subtil A (2020) Latent class models in the evaluation of biomedical diagnostic tests and internet traffic anomaly detection. Doctoral’s thesis, Instituto Superior Técnico, Universidade de Lisboa, Portugal
Zurück zum Zitat Teles P, Brito P (2015) Modeling interval time series with space-time processes. Commun Stat Theory Methods 44(17):3599–3627MathSciNetCrossRef Teles P, Brito P (2015) Modeling interval time series with space-time processes. Commun Stat Theory Methods 44(17):3599–3627MathSciNetCrossRef
Zurück zum Zitat Vilela M (2015) Classical and robust symbolic principal component analysis for interval data. Master’s Thesis, Instituto Superior Técnico, Universidade de Lisboa, Portugal Vilela M (2015) Classical and robust symbolic principal component analysis for interval data. Master’s Thesis, Instituto Superior Técnico, Universidade de Lisboa, Portugal
Zurück zum Zitat Wang H, Guan R, Wu J (2012) CIPCA: complete-information-based principal component analysis for interval-valued data. Neurocomputing 86:158–169CrossRef Wang H, Guan R, Wu J (2012) CIPCA: complete-information-based principal component analysis for interval-valued data. Neurocomputing 86:158–169CrossRef
Zurück zum Zitat Zhang X, Sisson SA (2020) Constructing likelihood functions for interval-valued random variables. Scand J Stat 47:1–35MathSciNetCrossRef Zhang X, Sisson SA (2020) Constructing likelihood functions for interval-valued random variables. Scand J Stat 47:1–35MathSciNetCrossRef
Metadaten
Titel
Association measures for interval variables
verfasst von
M. Rosário Oliveira
Margarida Azeitona
António Pacheco
Rui Valadas
Publikationsdatum
03.07.2021
Verlag
Springer Berlin Heidelberg
Erschienen in
Advances in Data Analysis and Classification / Ausgabe 3/2022
Print ISSN: 1862-5347
Elektronische ISSN: 1862-5355
DOI
https://doi.org/10.1007/s11634-021-00445-8

Weitere Artikel der Ausgabe 3/2022

Advances in Data Analysis and Classification 3/2022 Zur Ausgabe

Premium Partner