Skip to main content
Erschienen in: Advances in Data Analysis and Classification 1/2019

03.02.2018 | Regular Article

Robust clustering for functional data based on trimming and constraints

verfasst von: Diego Rivera-García, Luis A. García-Escudero, Agustín Mayo-Iscar, Joaquín Ortega

Erschienen in: Advances in Data Analysis and Classification | Ausgabe 1/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Many clustering algorithms when the data are curves or functions have been recently proposed. However, the presence of contamination in the sample of curves can influence the performance of most of them. In this work we propose a robust, model-based clustering method that relies on an approximation to the “density function” for functional data. The robustness follows from the joint application of data-driven trimming, for reducing the effect of contaminated observations, and constraints on the variances, for avoiding spurious clusters in the solution. The algorithm is designed to perform clustering and outlier detection simultaneously by maximizing a trimmed “pseudo” likelihood. The proposed method has been evaluated and compared with other existing methods through a simulation study. Better performance for the proposed methodology is shown when a fraction of contaminating curves is added to a non-contaminated sample. Finally, an application to a real data set that has been previously considered in the literature is given.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Bouveyron C, Jacques J (2011) Model-based clustering of time series in group-specific functional subspaces. Adv Data Anal Classif 5(4):281–300MathSciNetCrossRefMATH Bouveyron C, Jacques J (2011) Model-based clustering of time series in group-specific functional subspaces. Adv Data Anal Classif 5(4):281–300MathSciNetCrossRefMATH
Zurück zum Zitat Bouveyron C, Jacques J (2014) funHDDC: model-based clustering in group-specific functional subspaces. R package version 1.0 Bouveyron C, Jacques J (2014) funHDDC: model-based clustering in group-specific functional subspaces. R package version 1.0
Zurück zum Zitat Cattell RB (1966) The scree test for the number of factors. Multivar Behav Res 2:245–276CrossRef Cattell RB (1966) The scree test for the number of factors. Multivar Behav Res 2:245–276CrossRef
Zurück zum Zitat Cerioli A, García-Escudero LA, Mayo-Iscar A, Riani M (2017) Finding the number of normal groups in model-based clustering via constrained likelihoods. J Comput Graph Stat Cerioli A, García-Escudero LA, Mayo-Iscar A, Riani M (2017) Finding the number of normal groups in model-based clustering via constrained likelihoods. J Comput Graph Stat
Zurück zum Zitat Cuesta-Albertos JA, Fraiman R (2007) Impartial trimmed \(k\)-means for functional data. Comput Stat Data Anal 51(10):4864–4877MathSciNetCrossRefMATH Cuesta-Albertos JA, Fraiman R (2007) Impartial trimmed \(k\)-means for functional data. Comput Stat Data Anal 51(10):4864–4877MathSciNetCrossRefMATH
Zurück zum Zitat Cuesta-Albertos JA, Gordaliza A, Matrán C (1997) Trimmed \(k\)-means: an attempt to robustify quantizers. Ann Stat 25(2):553–576MathSciNetCrossRefMATH Cuesta-Albertos JA, Gordaliza A, Matrán C (1997) Trimmed \(k\)-means: an attempt to robustify quantizers. Ann Stat 25(2):553–576MathSciNetCrossRefMATH
Zurück zum Zitat Febrero M, Galeano P, González-Manteiga W (2008) Outlier detection in functional data by depth measures, with application to identify abnormal \({\rm NO}x\) levels. Environmetrics 19(4):331–345MathSciNetCrossRef Febrero M, Galeano P, González-Manteiga W (2008) Outlier detection in functional data by depth measures, with application to identify abnormal \({\rm NO}x\) levels. Environmetrics 19(4):331–345MathSciNetCrossRef
Zurück zum Zitat Febrero-Bande M, de la Fuente M Oviedo (2012) Statistical computing in functional data analysis: the R package fda.usc. J Stat Softw 51(4):1–28CrossRef Febrero-Bande M, de la Fuente M Oviedo (2012) Statistical computing in functional data analysis: the R package fda.usc. J Stat Softw 51(4):1–28CrossRef
Zurück zum Zitat Ferraty F, Vieu P (2006) Nonparametric functional data analysis. Springer series in statistics. Springer, New YorkMATH Ferraty F, Vieu P (2006) Nonparametric functional data analysis. Springer series in statistics. Springer, New YorkMATH
Zurück zum Zitat Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97(458):611–631MathSciNetCrossRefMATH Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97(458):611–631MathSciNetCrossRefMATH
Zurück zum Zitat Fritz H, García-Escudero LA, Mayo-Iscar A (2013) A fast algorithm for robust constrained clustering. Comput Stat Data Anal 61:124–136MathSciNetCrossRefMATH Fritz H, García-Escudero LA, Mayo-Iscar A (2013) A fast algorithm for robust constrained clustering. Comput Stat Data Anal 61:124–136MathSciNetCrossRefMATH
Zurück zum Zitat Gallegos MT (2002) Maximum likelihood clustering with outliers. In: Classification, clustering, and data analysis (Cracow, 2002). Studies in classification, data analysis, and knowledge organization. Springer, Berlin, pp. 247–255 Gallegos MT (2002) Maximum likelihood clustering with outliers. In: Classification, clustering, and data analysis (Cracow, 2002). Studies in classification, data analysis, and knowledge organization. Springer, Berlin, pp. 247–255
Zurück zum Zitat García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2008) A general trimming approach to robust cluster analysis. Ann Stat 36(3):1324–1345MathSciNetCrossRefMATH García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2008) A general trimming approach to robust cluster analysis. Ann Stat 36(3):1324–1345MathSciNetCrossRefMATH
Zurück zum Zitat García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2015) Avoiding spurious local maximizers in mixture modeling. Stat Comput 25(3):619–633MathSciNetCrossRefMATH García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2015) Avoiding spurious local maximizers in mixture modeling. Stat Comput 25(3):619–633MathSciNetCrossRefMATH
Zurück zum Zitat García-Escudero LA, Gordaliza A, Mayo-Iscar A (2014) A constrained robust proposal for mixture modeling avoiding spurious solutions. Adv Data Anal Classif 8(1):27–43MathSciNetCrossRef García-Escudero LA, Gordaliza A, Mayo-Iscar A (2014) A constrained robust proposal for mixture modeling avoiding spurious solutions. Adv Data Anal Classif 8(1):27–43MathSciNetCrossRef
Zurück zum Zitat Jacques J, Preda C (2013) Funclust: a curves clustering method using functional random variables density approximation. Neurocomputing 112:164–171CrossRef Jacques J, Preda C (2013) Funclust: a curves clustering method using functional random variables density approximation. Neurocomputing 112:164–171CrossRef
Zurück zum Zitat McLachlan GJ, Peel D (2000) Finite mixture models. Wiley series in probability and statistics, New YorkCrossRefMATH McLachlan GJ, Peel D (2000) Finite mixture models. Wiley series in probability and statistics, New YorkCrossRefMATH
Zurück zum Zitat Ramsay JO, Silverman BW (2005) Functional data analysis, 2nd edn. Springer series in statistics. Springer, New YorkMATH Ramsay JO, Silverman BW (2005) Functional data analysis, 2nd edn. Springer series in statistics. Springer, New YorkMATH
Zurück zum Zitat Ramsay JO, Wickham H, Graves S, Hooker G (2014) fda: functional data analysis. R package version 2.4.4 Ramsay JO, Wickham H, Graves S, Hooker G (2014) fda: functional data analysis. R package version 2.4.4
Zurück zum Zitat Ritter G (2015) Robust cluster analysis and variable selection, vol 137. Monographs on statistics and applied probability. CRC Press, Boca Raton, FLMATH Ritter G (2015) Robust cluster analysis and variable selection, vol 137. Monographs on statistics and applied probability. CRC Press, Boca Raton, FLMATH
Zurück zum Zitat Sawant P, Billor N, Shin H (2012) Functional outlier detection with robust functional principal component analysis. Comput Stat 27(1):83–102MathSciNetCrossRefMATH Sawant P, Billor N, Shin H (2012) Functional outlier detection with robust functional principal component analysis. Comput Stat 27(1):83–102MathSciNetCrossRefMATH
Zurück zum Zitat Sguera C, Galeano P, Lillo RE (2015) Functional outlier detection by a local depth with application to NOx levels. Stoch Environ Res Risk Assess 462:1835–1851 Sguera C, Galeano P, Lillo RE (2015) Functional outlier detection by a local depth with application to NOx levels. Stoch Environ Res Risk Assess 462:1835–1851
Zurück zum Zitat Soueidatt M (2014) Funclustering: a package for functional data clustering. R package version 1.0.1 Soueidatt M (2014) Funclustering: a package for functional data clustering. R package version 1.0.1
Metadaten
Titel
Robust clustering for functional data based on trimming and constraints
verfasst von
Diego Rivera-García
Luis A. García-Escudero
Agustín Mayo-Iscar
Joaquín Ortega
Publikationsdatum
03.02.2018
Verlag
Springer Berlin Heidelberg
Erschienen in
Advances in Data Analysis and Classification / Ausgabe 1/2019
Print ISSN: 1862-5347
Elektronische ISSN: 1862-5355
DOI
https://doi.org/10.1007/s11634-018-0312-7

Weitere Artikel der Ausgabe 1/2019

Advances in Data Analysis and Classification 1/2019 Zur Ausgabe