Skip to main content
Erschienen in: Journal of Classification 1/2022

22.11.2021

High-Dimensional Clustering via Random Projections

verfasst von: Laura Anderlucci, Francesca Fortunato, Angela Montanari

Erschienen in: Journal of Classification | Ausgabe 1/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This work addresses the unsupervised classification issue for high-dimensional data by exploiting the general idea of Random Projection Ensemble. Specifically, we propose to generate a set of low-dimensional independent random projections and to perform model-based clustering on each of them. The top B projections, i.e., the projections which show the best grouping structure, are then retained. The final partition is obtained by aggregating the clusters found in the projections via consensus. The performances of the method are assessed on both real and simulated datasets. The obtained results suggest that the proposal represents a promising tool for high-dimensional clustering.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
See the R Documentation for the sort function with default settings.
 
Literatur
Zurück zum Zitat Achlioptas, D. (2003). Database-friendly random projections: Johnson-Lindenstrauss with binary coins. Journal of Computer and System Sciences, 66(4), 671–687.MathSciNetMATHCrossRef Achlioptas, D. (2003). Database-friendly random projections: Johnson-Lindenstrauss with binary coins. Journal of Computer and System Sciences, 66(4), 671–687.MathSciNetMATHCrossRef
Zurück zum Zitat Ahfock, D. C., Astle, W. J., & Richardson, S. (2020). Statistical properties of sketching algorithms. Biometrika. asaa062. Ahfock, D. C., Astle, W. J., & Richardson, S. (2020). Statistical properties of sketching algorithms. Biometrika. asaa062.
Zurück zum Zitat Alizadeh, A. A., Eisen, M. B., Davis, R. E., Ma, C., Lossos, I. S., Rosenwald, A., Boldrick, J. C., Sabet, H., Tran, T., Yu, X., & et al. (2000). Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature, 403(6769), 503–511.CrossRef Alizadeh, A. A., Eisen, M. B., Davis, R. E., Ma, C., Lossos, I. S., Rosenwald, A., Boldrick, J. C., Sabet, H., Tran, T., Yu, X., & et al. (2000). Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature, 403(6769), 503–511.CrossRef
Zurück zum Zitat Bellman, Richard. (1957). Dynamic programming. Princeton: Princeton University Press.MATH Bellman, Richard. (1957). Dynamic programming. Princeton: Princeton University Press.MATH
Zurück zum Zitat Bergé, L., Bouveyron, C., & Girard, S. (2012). HDclassif: An R package for model-based clustering and discriminant analysis of high-dimensional data. Journal of Statistical Software, 46(6), 1–29.CrossRef Bergé, L., Bouveyron, C., & Girard, S. (2012). HDclassif: An R package for model-based clustering and discriminant analysis of high-dimensional data. Journal of Statistical Software, 46(6), 1–29.CrossRef
Zurück zum Zitat Bhattacharya, A., Kar, P., & Pal, M. (2009). On low distortion embeddings of statistical distance measures into low dimensional spaces. In International Conference on Database and Expert Systems Applications, Springer (pp. 164–172). Bhattacharya, A., Kar, P., & Pal, M. (2009). On low distortion embeddings of statistical distance measures into low dimensional spaces. In International Conference on Database and Expert Systems Applications, Springer (pp. 164–172).
Zurück zum Zitat Biernacki, C., & Lourme, A. (2014). Stable and visualizable Gaussian parsimonious clustering models. Statistics and Computing, 24(6), 953–969.MathSciNetMATHCrossRef Biernacki, C., & Lourme, A. (2014). Stable and visualizable Gaussian parsimonious clustering models. Statistics and Computing, 24(6), 953–969.MathSciNetMATHCrossRef
Zurück zum Zitat Bingham, E., & Mannila, H. (2001). Random projection in dimensionality reduction: applications to image and text data. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. https://doi.org/10.1145/502512.502546 (pp. 245–250). Bingham, E., & Mannila, H. (2001). Random projection in dimensionality reduction: applications to image and text data. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. https://​doi.​org/​10.​1145/​502512.​502546 (pp. 245–250).
Zurück zum Zitat Bodenhofer, U., Kothmeier, A., & Hochreiter, S. (2011). Apcluster: an R package for affinity propagation clustering. Bioinformatics, 27, 2463–2464.CrossRef Bodenhofer, U., Kothmeier, A., & Hochreiter, S. (2011). Apcluster: an R package for affinity propagation clustering. Bioinformatics, 27, 2463–2464.CrossRef
Zurück zum Zitat Boongoen, T., & Iam-On, N. (2018). Cluster ensembles: A survey of approaches with recent extensions and applications. Computer Science Review, 28, 1–25.MathSciNetMATHCrossRef Boongoen, T., & Iam-On, N. (2018). Cluster ensembles: A survey of approaches with recent extensions and applications. Computer Science Review, 28, 1–25.MathSciNetMATHCrossRef
Zurück zum Zitat Boutsidis, C., Zouzias, A., & Drineas, P. (2010). Random projections for k-means clustering. In J. Lafferty, C. Williams, J. Shawe-Taylor, R. Zemel, & A. Culotta (Eds.). Advances in Neural Information Processing Systems, (Vol. 23 pp. 298–306). Curran Associates, Inc. Boutsidis, C., Zouzias, A., & Drineas, P. (2010). Random projections for k-means clustering. In J. Lafferty, C. Williams, J. Shawe-Taylor, R. Zemel, & A. Culotta (Eds.). Advances in Neural Information Processing Systems, (Vol. 23 pp. 298–306). Curran Associates, Inc.
Zurück zum Zitat Bouveyron, C., & Brunet-Saumard, C. (2014). Model-based clustering of high-dimensional data: A review. Computational Statistics & Data Analysis, 71, 52–78.MathSciNetMATHCrossRef Bouveyron, C., & Brunet-Saumard, C. (2014). Model-based clustering of high-dimensional data: A review. Computational Statistics & Data Analysis, 71, 52–78.MathSciNetMATHCrossRef
Zurück zum Zitat Bouveyron, C., Celeux, G., Murphy, T. B., & Raftery, A. E. (2019). Model-based clustering and classification for data science: with applications in R Vol. 50. Cambridge: Cambridge University Press.MATHCrossRef Bouveyron, C., Celeux, G., Murphy, T. B., & Raftery, A. E. (2019). Model-based clustering and classification for data science: with applications in R Vol. 50. Cambridge: Cambridge University Press.MATHCrossRef
Zurück zum Zitat Bouveyron, C., Girard, S., & Schmid, C. (2007). High-dimensional data clustering. Computational Statistics & Data Analysis, 52(1), 502–519.MathSciNetMATHCrossRef Bouveyron, C., Girard, S., & Schmid, C. (2007). High-dimensional data clustering. Computational Statistics & Data Analysis, 52(1), 502–519.MathSciNetMATHCrossRef
Zurück zum Zitat Cannings, T. I. (2021). Random projections: Data perturbation for classification problems. WIREs Computational Statistics, 13(1), e1499.MathSciNetCrossRef Cannings, T. I. (2021). Random projections: Data perturbation for classification problems. WIREs Computational Statistics, 13(1), e1499.MathSciNetCrossRef
Zurück zum Zitat Cannings, T. I., & Samworth, R. J. (2017). Random-projection ensemble classification. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79(4), 959–1035.MathSciNetMATHCrossRef Cannings, T. I., & Samworth, R. J. (2017). Random-projection ensemble classification. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79(4), 959–1035.MathSciNetMATHCrossRef
Zurück zum Zitat Cattell, R. B. (1966). The scree test for the number of factors. Multivariate behavioral research, 1(2), 245–276.CrossRef Cattell, R. B. (1966). The scree test for the number of factors. Multivariate behavioral research, 1(2), 245–276.CrossRef
Zurück zum Zitat Celeux, G., & Govaert, G. (1995). Gaussian parsimonious clustering models. Pattern Recognition, 28(5), 781–793.CrossRef Celeux, G., & Govaert, G. (1995). Gaussian parsimonious clustering models. Pattern Recognition, 28(5), 781–793.CrossRef
Zurück zum Zitat Chang, W.-C. (1983). On using principal components before separating a mixture of two multivariate normal distributions. Journal of the Royal Statistical Society: Series C (Applied Statistics), 32(3), 267–275.MathSciNetMATH Chang, W.-C. (1983). On using principal components before separating a mixture of two multivariate normal distributions. Journal of the Royal Statistical Society: Series C (Applied Statistics), 32(3), 267–275.MathSciNetMATH
Zurück zum Zitat Chung, D., Chun, H., & Keles, S. (2019). spls: Sparse partial least squares (spls) regression and classification. R package version 2.2-3. Chung, D., Chun, H., & Keles, S. (2019). spls: Sparse partial least squares (spls) regression and classification. R package version 2.2-3.
Zurück zum Zitat Chung, D., & Keles, S. (2010). Sparse partial least squares classification for high dimensional data. Statistical applications in genetics and molecular biology, 9(1). Chung, D., & Keles, S. (2010). Sparse partial least squares classification for high dimensional data. Statistical applications in genetics and molecular biology, 9(1).
Zurück zum Zitat Dasgupta, S. (1999). Learning mixtures of gaussians. In 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039) (pp. 634–644). Dasgupta, S. (1999). Learning mixtures of gaussians. In 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039) (pp. 634–644).
Zurück zum Zitat Dasgupta, S. (2000). Experiments with random projection. In Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence, UAI’00 (pp. 143–151). San Francisco: Morgan Kaufmann Publishers Inc. Dasgupta, S. (2000). Experiments with random projection. In Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence, UAI’00 (pp. 143–151). San Francisco: Morgan Kaufmann Publishers Inc.
Zurück zum Zitat Dean, N., Murphy, T. B., & Downey, G. (2006). Using unlabelled data to update classification rules with applications in food authenticity studies. Journal of the Royal Statistical Society, Series C: Applied Statistics, 55(1), 1–14.MathSciNetMATHCrossRef Dean, N., Murphy, T. B., & Downey, G. (2006). Using unlabelled data to update classification rules with applications in food authenticity studies. Journal of the Royal Statistical Society, Series C: Applied Statistics, 55(1), 1–14.MathSciNetMATHCrossRef
Zurück zum Zitat Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological), 39(1), 1–22.MathSciNetMATH Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological), 39(1), 1–22.MathSciNetMATH
Zurück zum Zitat Dettling, M. (2004). Bagboosting for tumor classification with gene expression data. Bioinformatics, 20(18), 3583–3593.CrossRef Dettling, M. (2004). Bagboosting for tumor classification with gene expression data. Bioinformatics, 20(18), 3583–3593.CrossRef
Zurück zum Zitat Dettling, M., & Bühlmann, P. (2002). Supervised clustering of genes. Genome Biology, 3(12), research0069–1.CrossRef Dettling, M., & Bühlmann, P. (2002). Supervised clustering of genes. Genome Biology, 3(12), research0069–1.CrossRef
Zurück zum Zitat Dimitriadou, E., Weingessel, A., & Hornik, K. (2002). A combination scheme for fuzzy clustering. International Journal of Pattern Recognition and Artificial Intelligence, 16(07), 901–912.MATHCrossRef Dimitriadou, E., Weingessel, A., & Hornik, K. (2002). A combination scheme for fuzzy clustering. International Journal of Pattern Recognition and Artificial Intelligence, 16(07), 901–912.MATHCrossRef
Zurück zum Zitat Downey, G., McElhinney, J., & Fearn, T. (2000). Species identification in selected raw homogenized meats by reflectance spectroscopy in the mid-infrared, near-infrared, and visible ranges. Applied Spectroscopy, 54(6), 894–899.CrossRef Downey, G., McElhinney, J., & Fearn, T. (2000). Species identification in selected raw homogenized meats by reflectance spectroscopy in the mid-infrared, near-infrared, and visible ranges. Applied Spectroscopy, 54(6), 894–899.CrossRef
Zurück zum Zitat Durrant, R. J., & Kabán, A (2015). Random projections as regularizers: learning a linear discriminant from fewer observations than dimensions. Machine Learning, 99(2), 257–286.MathSciNetMATHCrossRef Durrant, R. J., & Kabán, A (2015). Random projections as regularizers: learning a linear discriminant from fewer observations than dimensions. Machine Learning, 99(2), 257–286.MathSciNetMATHCrossRef
Zurück zum Zitat Fern, X. Z., & Brodley, C. E. (2003). Random projection for high dimensional data clustering: A cluster ensemble approach. In Proceedings of the 20th international conference on machine learning (ICML-03) (pp. 186–193). Fern, X. Z., & Brodley, C. E. (2003). Random projection for high dimensional data clustering: A cluster ensemble approach. In Proceedings of the 20th international conference on machine learning (ICML-03) (pp. 186–193).
Zurück zum Zitat Galimberti, G., Manisi, A., & Soffritti, G. (2018). Modelling the role of variables in model-based cluster analysis. Statistics and Computing, 28 (1), 145–169.MathSciNetMATHCrossRef Galimberti, G., Manisi, A., & Soffritti, G. (2018). Modelling the role of variables in model-based cluster analysis. Statistics and Computing, 28 (1), 145–169.MathSciNetMATHCrossRef
Zurück zum Zitat Gataric, M., Wang, T., & Samworth, R. J. (2020). Sparse principal component analysis via axis-aligned random projections. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 82(2), 329–359. Ghahramani,Z.,&Hinton,G.E.(1997).TheEMalgorithmforfactoranalyzers. UniversityofTorontoToronto. Gataric, M., Wang, T., & Samworth, R. J. (2020). Sparse principal component analysis via axis-aligned random projections. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 82(2), 329–359. Ghahramani,Z.,&Hinton,G.E.(1997).TheEMalgorithmforfactoranalyzers. UniversityofTorontoToronto.
Zurück zum Zitat Golub,G.H.,&VanLoan,C.F.(1996).Matrixcomputations,3rd edn. Baltimore:TheJohnsHopkinsUniversityPress. Golub,G.H.,&VanLoan,C.F.(1996).Matrixcomputations,3rd edn. Baltimore:TheJohnsHopkinsUniversityPress.
Zurück zum Zitat Haar,A.(1933).DerMassbegriffinderTheoriederkontinuierlichenGruppen. AnnalsofMathematics,34,147–169. Haar,A.(1933).DerMassbegriffinderTheoriederkontinuierlichenGruppen. AnnalsofMathematics,34,147–169.
Zurück zum Zitat Hartigan,J.A.,&Wong,M.A.(1979).Algorithmas136:Ak-meansclustering algorithm.JournaloftheRoyalStatisticalSociety.SeriesC(Applied Statistics),28(1),100–108. Hartigan,J.A.,&Wong,M.A.(1979).Algorithmas136:Ak-meansclustering algorithm.JournaloftheRoyalStatisticalSociety.SeriesC(Applied Statistics),28(1),100–108.
Zurück zum Zitat Hennig,C.(2019).Clustervalidationbymeasurementofclusteringcharacteristics relevanttotheuser.DataAnalysisandApplications1:Clusteringand Regression,Modeling-estimating,ForecastingandDataMining,2,1–24. Hennig,C.(2019).Clustervalidationbymeasurementofclusteringcharacteristics relevanttotheuser.DataAnalysisandApplications1:Clusteringand Regression,Modeling-estimating,ForecastingandDataMining,2,1–24.
Zurück zum Zitat Hennig,C.,&Liao,T.F.(2013).Howtofindanappropriateclusteringformixed-type variableswithapplicationtosocio-economicstratification.Journalofthe RoyalStatisticalSociety:SeriesC(AppliedStatistics),62(3),309–369. Hennig,C.,&Liao,T.F.(2013).Howtofindanappropriateclusteringformixed-type variableswithapplicationtosocio-economicstratification.Journalofthe RoyalStatisticalSociety:SeriesC(AppliedStatistics),62(3),309–369.
Zurück zum Zitat Hennig,C.,Meila,M.,Murtagh,F.,&Rocci,R.(2015).Handbookofcluster analysis. BocaRaton:CRCPress. Hennig,C.,Meila,M.,Murtagh,F.,&Rocci,R.(2015).Handbookofcluster analysis. BocaRaton:CRCPress.
Zurück zum Zitat Hornik,K.(2005).ACLUEforCLUsterensembles.Journalof StatisticalSoftware,14(12),1–25. Hornik,K.(2005).ACLUEforCLUsterensembles.Journalof StatisticalSoftware,14(12),1–25.
Zurück zum Zitat Hubert,L.,&Arabie,P.(1985).Comparingpartitions.Journalof Classification,2(1),193–218.CrossRef Hubert,L.,&Arabie,P.(1985).Comparingpartitions.Journalof Classification,2(1),193–218.CrossRef
Zurück zum Zitat Johnson,W.B.,&Lindenstrauss,J.(1984).ExtensionsofLipschitzmappingsinto aHilbertspace.Contemporarymathematics,26(189-206),1. Johnson,W.B.,&Lindenstrauss,J.(1984).ExtensionsofLipschitzmappingsinto aHilbertspace.Contemporarymathematics,26(189-206),1.
Zurück zum Zitat Karatzoglou,A.,Smola,A.,Hornik,K.,&Zeileis,A.(2004).kernlab–an S4packageforkernelmethodsinR.JournalofStatisticalSoftware,11 (9),1–20. Karatzoglou,A.,Smola,A.,Hornik,K.,&Zeileis,A.(2004).kernlab–an S4packageforkernelmethodsinR.JournalofStatisticalSoftware,11 (9),1–20.
Zurück zum Zitat Kaufman,L.,&Rousseeuw,PJ.(2009).Findinggroupsindata:anintroduction toclusteranalysis Vol. 344. NewYork,:Wiley. Kaufman,L.,&Rousseeuw,PJ.(2009).Findinggroupsindata:anintroduction toclusteranalysis Vol. 344. NewYork,:Wiley.
Zurück zum Zitat Kittler,J.,Hatef,M.,Duin,R.P.W.,&Matas,J.(1998).On combiningclassifiers.IEEETransactionsonPatternAnalysisandMachine Intelligence,20(3),226–239.CrossRef Kittler,J.,Hatef,M.,Duin,R.P.W.,&Matas,J.(1998).On combiningclassifiers.IEEETransactionsonPatternAnalysisandMachine Intelligence,20(3),226–239.CrossRef
Zurück zum Zitat MacQueen,J.,etal.(1967).Somemethodsforclassificationandanalysis ofmultivariateobservations.In ProceedingsoftheFifthBerkeleysymposiumon MathematicalStatisticsandProbability, (Vol. 1pp. 281–297).Oakland. MacQueen,J.,etal.(1967).Somemethodsforclassificationandanalysis ofmultivariateobservations.In ProceedingsoftheFifthBerkeleysymposiumon MathematicalStatisticsandProbability, (Vol. 1pp. 281–297).Oakland.
Zurück zum Zitat Maechler,M.,Rousseeuw,P.,Struyf,A.,Hubert,M.,&Hornik,K.(2019). cluster:Clusteranalysisbasicsandextensions.Rpackageversion2.1.0. Maechler,M.,Rousseeuw,P.,Struyf,A.,Hubert,M.,&Hornik,K.(2019). cluster:Clusteranalysisbasicsandextensions.Rpackageversion2.1.0.
Zurück zum Zitat Maugis,C.,Celeux,G.,&Martin-Magniette,M.-L.(2009).Variableselectionfor clusteringwithGaussianMixtureModels.Biometrics,65(3),701–709.MathSciNetCrossRef Maugis,C.,Celeux,G.,&Martin-Magniette,M.-L.(2009).Variableselectionfor clusteringwithGaussianMixtureModels.Biometrics,65(3),701–709.MathSciNetCrossRef
Zurück zum Zitat Maugis,C.,Celeux,G.,&Martin-Magniette,M.-L.(2009).Variableselectionin model-basedclustering:Ageneralvariablerolemodeling.Computational Statistics&DataAnalysis,53(11),3872–3882.MATH Maugis,C.,Celeux,G.,&Martin-Magniette,M.-L.(2009).Variableselectionin model-basedclustering:Ageneralvariablerolemodeling.Computational Statistics&DataAnalysis,53(11),3872–3882.MATH
Zurück zum Zitat McLachlan,G.J.,Lee,S.X.,&Rathnayake,S.I.(2019).Finitemixturemodels. Annualreviewofstatisticsanditsapplication,6,355–378. McLachlan,G.J.,Lee,S.X.,&Rathnayake,S.I.(2019).Finitemixturemodels. Annualreviewofstatisticsanditsapplication,6,355–378.
Zurück zum Zitat McLachlan,G.J.,Peel,D.,&Bean,R.W.(2003).Modellinghigh-dimensionaldata bymixturesoffactoranalyzers.ComputationalStatistics&DataAnalysis, 41(3-4),379–388. McLachlan,G.J.,Peel,D.,&Bean,R.W.(2003).Modellinghigh-dimensionaldata bymixturesoffactoranalyzers.ComputationalStatistics&DataAnalysis, 41(3-4),379–388.
Zurück zum Zitat McLachlan,G.J.,&Peel,D.(2000).FiniteMixtureModels. NewYork:Wiley. McLachlan,G.J.,&Peel,D.(2000).FiniteMixtureModels. NewYork:Wiley.
Zurück zum Zitat McNicholas,P.D.(2016).Model-basedclustering.Journalof Classification,33(3),331–373.CrossRef McNicholas,P.D.(2016).Model-basedclustering.Journalof Classification,33(3),331–373.CrossRef
Zurück zum Zitat McNicholas,P.D.,&Murphy,T.B.(2008).ParsimoniousGaussianmixture models.StatisticsandComputing,18(3),285–296. McNicholas,P.D.,&Murphy,T.B.(2008).ParsimoniousGaussianmixture models.StatisticsandComputing,18(3),285–296.
Zurück zum Zitat Montanari,A.,&Viroli,C.(2011).Maximumlikelihoodestimationofmixtures offactoranalyzers.ComputationalStatistics&DataAnalysis,55(9), 2712–2723. Montanari,A.,&Viroli,C.(2011).Maximumlikelihoodestimationofmixtures offactoranalyzers.ComputationalStatistics&DataAnalysis,55(9), 2712–2723.
Zurück zum Zitat Murphy,T.B.,Dean,N.,&Raftery,A.E.(2010).Variableselectionandupdating inmodel-baseddiscriminantanalysisforhighdimensionaldatawithfoodauthenticity applications.TheAnnalsofAppliedStatistics,4(1),396–421. Murphy,T.B.,Dean,N.,&Raftery,A.E.(2010).Variableselectionandupdating inmodel-baseddiscriminantanalysisforhighdimensionaldatawithfoodauthenticity applications.TheAnnalsofAppliedStatistics,4(1),396–421.
Zurück zum Zitat Ng,A.Y.,Jordan,M.I.,&Weiss,Y.(2002).Onspectralclustering:Analysisand analgorithm.In Advancesinneuralinformationprocessingsystems(pp. 849–856). Ng,A.Y.,Jordan,M.I.,&Weiss,Y.(2002).Onspectralclustering:Analysisand analgorithm.In Advancesinneuralinformationprocessingsystems(pp. 849–856).
Zurück zum Zitat R.CoreTeam.(2020).R:Alanguageandenvironmentforstatisticalcomputing. Vienna,Austria:RFoundationforStatisticalComputing. R.CoreTeam.(2020).R:Alanguageandenvironmentforstatisticalcomputing. Vienna,Austria:RFoundationforStatisticalComputing.
Zurück zum Zitat Raftery,A.E.,&Dean,N.(2006).Variableselectionformodel-basedclustering. JournaloftheAmericanStatisticalAssociation,101(473),168–178. Raftery,A.E.,&Dean,N.(2006).Variableselectionformodel-basedclustering. JournaloftheAmericanStatisticalAssociation,101(473),168–178.
Zurück zum Zitat Ramey,J.A.(2012).clusteval:Evaluationofclusteringalgorithms.Rpackage version0.1. Ramey,J.A.(2012).clusteval:Evaluationofclusteringalgorithms.Rpackage version0.1.
Zurück zum Zitat Ruiz,F.E.,Pérez,P.S.,&Bonev,B.I.(2009).Informationtheoryincomputer visionandpatternrecognition. Berlin:SpringerScience&BusinessMedia. Ruiz,F.E.,Pérez,P.S.,&Bonev,B.I.(2009).Informationtheoryincomputer visionandpatternrecognition. Berlin:SpringerScience&BusinessMedia.
Zurück zum Zitat Scrucca,L.(2016).Geneticalgorithmsforsubsetselectioninmodel-basedclustering (pp. 55–70). Springer. Scrucca,L.(2016).Geneticalgorithmsforsubsetselectioninmodel-basedclustering (pp. 55–70). Springer.
Zurück zum Zitat Scrucca,L.,Fop,M.,Murphy,T.B.,&Raftery,A.E.(2016).mclust 5:clustering,classificationanddensityestimationusingGaussianfinitemixturemodels. TheRJournal,8(1),289–317. Scrucca,L.,Fop,M.,Murphy,T.B.,&Raftery,A.E.(2016).mclust 5:clustering,classificationanddensityestimationusingGaussianfinitemixturemodels. TheRJournal,8(1),289–317.
Zurück zum Zitat Scrucca,L.,&Raftery,A.E.(2018).clustvarsel:Apackageimplementingvariable selectionforGaussianmodel-basedclusteringinR.JournalofStatistical Software,84(1),1–28. Scrucca,L.,&Raftery,A.E.(2018).clustvarsel:Apackageimplementingvariable selectionforGaussianmodel-basedclusteringinR.JournalofStatistical Software,84(1),1–28.
Zurück zum Zitat Slawski,M.,etal.(2018).Onprincipalcomponentsregression,random projections,andcolumnsubsampling.ElectronicJournalofStatistics,12 (2),3673–3712. Slawski,M.,etal.(2018).Onprincipalcomponentsregression,random projections,andcolumnsubsampling.ElectronicJournalofStatistics,12 (2),3673–3712.
Zurück zum Zitat Strehl,A.,&Ghosh,J.(2002).Clusterensembles–aknowledgereuseframework forcombiningmultiplepartitions.JournalofMachineLearningResearch, 3(Dec),583–617. Strehl,A.,&Ghosh,J.(2002).Clusterensembles–aknowledgereuseframework forcombiningmultiplepartitions.JournalofMachineLearningResearch, 3(Dec),583–617.
Zurück zum Zitat Thanei,G.-A.,Heinze,C.,&Meinshausen,N.(2017).Randomprojectionsfor large-scaleregression.In Bigandcomplexdataanalysis(pp. 51–68). Springer. Thanei,G.-A.,Heinze,C.,&Meinshausen,N.(2017).Randomprojectionsfor large-scaleregression.In Bigandcomplexdataanalysis(pp. 51–68). Springer.
Zurück zum Zitat Vempala,S.S.(2004).Therandomprojectionmethod.Volume65ofDIMACSSeries inDiscreteMathematicsandTheoreticalComputerScience.AmericanMathematicalSoc. Vempala,S.S.(2004).Therandomprojectionmethod.Volume65ofDIMACSSeries inDiscreteMathematicsandTheoreticalComputerScience.AmericanMathematicalSoc.
Zurück zum Zitat Ward,J.H.,Jr.(1963).Hierarchicalgroupingtooptimizeanobjectivefunction. JournaloftheAmericanStatisticalAssociation,58(301),236–244. Ward,J.H.,Jr.(1963).Hierarchicalgroupingtooptimizeanobjectivefunction. JournaloftheAmericanStatisticalAssociation,58(301),236–244.
Zurück zum Zitat Xu,D.,&Tian,Y.(2015).Acomprehensivesurveyofclusteringalgorithms. AnnalsofDataScience,2(2),165–193.MathSciNet Xu,D.,&Tian,Y.(2015).Acomprehensivesurveyofclusteringalgorithms. AnnalsofDataScience,2(2),165–193.MathSciNet
Metadaten
Titel
High-Dimensional Clustering via Random Projections
verfasst von
Laura Anderlucci
Francesca Fortunato
Angela Montanari
Publikationsdatum
22.11.2021
Verlag
Springer US
Erschienen in
Journal of Classification / Ausgabe 1/2022
Print ISSN: 0176-4268
Elektronische ISSN: 1432-1343
DOI
https://doi.org/10.1007/s00357-021-09403-7

Weitere Artikel der Ausgabe 1/2022

Journal of Classification 1/2022 Zur Ausgabe

OriginalPaper

Chimeral Clustering

Premium Partner