Skip to main content
Erschienen in: Annals of Data Science 2/2015

01.06.2015

A Comprehensive Survey of Clustering Algorithms

verfasst von: Dongkuan Xu, Yingjie Tian

Erschienen in: Annals of Data Science | Ausgabe 2/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Data analysis is used as a common method in modern science research, which is across communication science, computer science and biology science. Clustering, as the basic composition of data analysis, plays a significant role. On one hand, many tools for cluster analysis have been created, along with the information increase and subject intersection. On the other hand, each clustering algorithm has its own strengths and weaknesses, due to the complexity of information. In this review paper, we begin at the definition of clustering, take the basic elements involved in the clustering process, such as the distance or similarity measurement and evaluation indicators, into consideration, and analyze the clustering algorithms from two perspectives, the traditional ones and the modern ones. All the discussed clustering algorithms will be compared in detail and comprehensively shown in Appendix Table 22.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
1.
Zurück zum Zitat Jain A, Dubes R (1988) Algorithms for clustering data. Prentice-Hall, Inc, Upper Saddle RiverMATH Jain A, Dubes R (1988) Algorithms for clustering data. Prentice-Hall, Inc, Upper Saddle RiverMATH
2.
3.
Zurück zum Zitat Everitt B, Landau S, Leese M (2001) Clustering analysis, 4th edn. Arnold, London Everitt B, Landau S, Leese M (2001) Clustering analysis, 4th edn. Arnold, London
4.
Zurück zum Zitat Gower J (1971) A general coefficient of similarity and some of its properties. Biometrics 27:857–871CrossRef Gower J (1971) A general coefficient of similarity and some of its properties. Biometrics 27:857–871CrossRef
5.
Zurück zum Zitat Estivill-Castro V (2002) Why so many clustering algorithms: a position paper. ACM SIGKDD Explor Newsl 4:65–75CrossRef Estivill-Castro V (2002) Why so many clustering algorithms: a position paper. ACM SIGKDD Explor Newsl 4:65–75CrossRef
6.
Zurück zum Zitat Färber I, Günnemann S, Kriegel H, Kröger P, Müller E, Schubert E, Seidl T, Zimek A (2010) On using class-labels in evaluation of clusterings. In MultiClust: 1st international workshop on discovering, summarizing and using multiple clusterings held in conjunction with KDD, Washington, DC Färber I, Günnemann S, Kriegel H, Kröger P, Müller E, Schubert E, Seidl T, Zimek A (2010) On using class-labels in evaluation of clusterings. In MultiClust: 1st international workshop on discovering, summarizing and using multiple clusterings held in conjunction with KDD, Washington, DC
7.
Zurück zum Zitat MacQueen J (1967) Some methods for classification and analysis of multivariate observations. Proc Fifth Berkeley Symp Math Stat Probab 1:281–297MathSciNet MacQueen J (1967) Some methods for classification and analysis of multivariate observations. Proc Fifth Berkeley Symp Math Stat Probab 1:281–297MathSciNet
8.
Zurück zum Zitat Park H, Jun C (2009) A simple and fast algorithm for K-medoids clustering. Expert Syst Appl 36:3336–3341CrossRef Park H, Jun C (2009) A simple and fast algorithm for K-medoids clustering. Expert Syst Appl 36:3336–3341CrossRef
9.
Zurück zum Zitat Kaufman L, Rousseeuw P (1990) Partitioning around medoids (program pam). Finding groups in data: an introduction to cluster analysis. Wiley, HobokenCrossRef Kaufman L, Rousseeuw P (1990) Partitioning around medoids (program pam). Finding groups in data: an introduction to cluster analysis. Wiley, HobokenCrossRef
11.
Zurück zum Zitat Ng R, Han J (2002) Clarans: a method for clustering objects for spatial data mining. IEEE Trans Knowl Data Eng 14:1003–1016CrossRef Ng R, Han J (2002) Clarans: a method for clustering objects for spatial data mining. IEEE Trans Knowl Data Eng 14:1003–1016CrossRef
12.
Zurück zum Zitat Boley D, Gini M, Gross R, Han E, Hastings K, Karypis G, Kumar V, Mobasher B, Moore J (1999) Partitioning-based clustering for web document categorization. Decis Support Syst 27:329–341CrossRef Boley D, Gini M, Gross R, Han E, Hastings K, Karypis G, Kumar V, Mobasher B, Moore J (1999) Partitioning-based clustering for web document categorization. Decis Support Syst 27:329–341CrossRef
13.
Zurück zum Zitat Jain A (2010) Data clustering: 50 years beyond K-means. Pattern Recognit Lett 31:651–666CrossRef Jain A (2010) Data clustering: 50 years beyond K-means. Pattern Recognit Lett 31:651–666CrossRef
14.
Zurück zum Zitat Velmurugan T, Santhanam T (2011) A survey of partition based clustering algorithms in data mining: an experimental approach. Inf Technol J 10:478–484CrossRef Velmurugan T, Santhanam T (2011) A survey of partition based clustering algorithms in data mining: an experimental approach. Inf Technol J 10:478–484CrossRef
17.
Zurück zum Zitat Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering method for very large databases. ACM SIGMOD Rec 25:103–104CrossRef Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering method for very large databases. ACM SIGMOD Rec 25:103–104CrossRef
18.
Zurück zum Zitat Guha S, Rastogi R, Shim K (1998) CURE: an efficient clustering algorithm for large databases. ACM SIGMOD Rec 27:73–84CrossRef Guha S, Rastogi R, Shim K (1998) CURE: an efficient clustering algorithm for large databases. ACM SIGMOD Rec 27:73–84CrossRef
19.
Zurück zum Zitat Guha S, Rastogi R, Shim K (1999) ROCK: a robust clustering algorithm for categorical attributes. In: Proceedings of the 15th international conference on data engineering, pp 512-521 Guha S, Rastogi R, Shim K (1999) ROCK: a robust clustering algorithm for categorical attributes. In: Proceedings of the 15th international conference on data engineering, pp 512-521
20.
Zurück zum Zitat Karypis G, Han E, Kumar V (1999) Chameleon: hierarchical clustering using dynamic modeling. Computer 32:68–75CrossRef Karypis G, Han E, Kumar V (1999) Chameleon: hierarchical clustering using dynamic modeling. Computer 32:68–75CrossRef
21.
Zurück zum Zitat Murtagh F (1983) A survey of recent advances in hierarchical clustering algorithms. Comput J 26:354–359MATHCrossRef Murtagh F (1983) A survey of recent advances in hierarchical clustering algorithms. Comput J 26:354–359MATHCrossRef
22.
Zurück zum Zitat Carlsson G, Mémoli F (2010) Characterization, stability and convergence of hierarchical clustering methods. J Mach Learn Res 11:1425–1470MATHMathSciNet Carlsson G, Mémoli F (2010) Characterization, stability and convergence of hierarchical clustering methods. J Mach Learn Res 11:1425–1470MATHMathSciNet
23.
Zurück zum Zitat Dunn J (1973) A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J Cybern 3:32–57MATHMathSciNetCrossRef Dunn J (1973) A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J Cybern 3:32–57MATHMathSciNetCrossRef
24.
Zurück zum Zitat Bezdek J (1981) Pattern recognition with fuzzy objective function algorithms. Plenum, New YorkMATHCrossRef Bezdek J (1981) Pattern recognition with fuzzy objective function algorithms. Plenum, New YorkMATHCrossRef
25.
Zurück zum Zitat Bezdek J, Ehrlich R, Full W (1984) FCM: the fuzzy c-means clustering algorithm. Comput Geosci 10:191–203ADSCrossRef Bezdek J, Ehrlich R, Full W (1984) FCM: the fuzzy c-means clustering algorithm. Comput Geosci 10:191–203ADSCrossRef
26.
Zurück zum Zitat Dave R, Bhaswan K (1992) Adaptive fuzzy c-shells clustering and detection of ellipses. IEEE Trans Neural Netw 3:643–662PubMedCrossRef Dave R, Bhaswan K (1992) Adaptive fuzzy c-shells clustering and detection of ellipses. IEEE Trans Neural Netw 3:643–662PubMedCrossRef
27.
Zurück zum Zitat Yager R, Filev D (1994) Approximate clustering via the mountain method. IEEE Trans Syst Man Cybern 24:1279–1284CrossRef Yager R, Filev D (1994) Approximate clustering via the mountain method. IEEE Trans Syst Man Cybern 24:1279–1284CrossRef
29.
Zurück zum Zitat Baraldi A, Blonda P (1999) A survey of fuzzy clustering algorithms for pattern recognition. I. IEEE Trans Syst Man Cybern Part B 29:778–785CrossRef Baraldi A, Blonda P (1999) A survey of fuzzy clustering algorithms for pattern recognition. I. IEEE Trans Syst Man Cybern Part B 29:778–785CrossRef
30.
Zurück zum Zitat Höppner F (1999) Fuzzy cluster analysis: methods for classification, data analysis and image recognition. Wiley, HobokenMATH Höppner F (1999) Fuzzy cluster analysis: methods for classification, data analysis and image recognition. Wiley, HobokenMATH
31.
Zurück zum Zitat Xu X, Ester M, Kriegel H, Sander J (1998) A distribution-based clustering algorithm for mining in large spatial databases. In: Proceedings of the fourteenth international conference on data engineering, pp 324-331 Xu X, Ester M, Kriegel H, Sander J (1998) A distribution-based clustering algorithm for mining in large spatial databases. In: Proceedings of the fourteenth international conference on data engineering, pp 324-331
32.
Zurück zum Zitat Rasmussen C (1999) The infinite Gaussian mixture model. Adv Neural Inf Process Syst 12:554–560 Rasmussen C (1999) The infinite Gaussian mixture model. Adv Neural Inf Process Syst 12:554–560
33.
Zurück zum Zitat Preheim S, Perrotta A, Martin-Platero A, Gupta A, Alm E (2013) Distribution-based clustering: using ecology to refine the operational taxonomic unit. Appl Environ Microbiol 79:6593–6603PubMedCentralPubMedCrossRef Preheim S, Perrotta A, Martin-Platero A, Gupta A, Alm E (2013) Distribution-based clustering: using ecology to refine the operational taxonomic unit. Appl Environ Microbiol 79:6593–6603PubMedCentralPubMedCrossRef
34.
Zurück zum Zitat Jiang B, Pei J, Tao Y, Lin X (2013) Clustering uncertain data based on probability distribution similarity. IEEE Trans Knowl Data Eng 25:751–763CrossRef Jiang B, Pei J, Tao Y, Lin X (2013) Clustering uncertain data based on probability distribution similarity. IEEE Trans Knowl Data Eng 25:751–763CrossRef
35.
Zurück zum Zitat Kriegel H, Kröger P, Sander J, Zimek A (2011) Densitybased clustering. Wiley Interdiscip Rev 1:231–240 Kriegel H, Kröger P, Sander J, Zimek A (2011) Densitybased clustering. Wiley Interdiscip Rev 1:231–240
36.
Zurück zum Zitat Ester M, Kriegel H, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the second ACM SIGKDD international conference on knowledge discovery and data mining, pp 226–231 Ester M, Kriegel H, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the second ACM SIGKDD international conference on knowledge discovery and data mining, pp 226–231
37.
Zurück zum Zitat Ankerst M, Breunig M, Kriegel H, Sander J (1999) OPTICS: ordering points to identify the clustering structure. In: Proceedings on 1999 ACM SIGMOD international conference on management of data, vol 28, pp 49–60 Ankerst M, Breunig M, Kriegel H, Sander J (1999) OPTICS: ordering points to identify the clustering structure. In: Proceedings on 1999 ACM SIGMOD international conference on management of data, vol 28, pp 49–60
38.
Zurück zum Zitat Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24:603–619CrossRef Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24:603–619CrossRef
39.
Zurück zum Zitat Januzaj E, Kriegel H, Pfeifle M (2004) Scalable density-based distributed clustering. In: Proceedings of the 8th european conference on principles and practice of knowledge discovery in databases, pp 231–244 Januzaj E, Kriegel H, Pfeifle M (2004) Scalable density-based distributed clustering. In: Proceedings of the 8th european conference on principles and practice of knowledge discovery in databases, pp 231–244
40.
Zurück zum Zitat Kriegel H, Pfeifle M (2005) Density-based clustering of uncertain data. In: Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining, pp 672–677 Kriegel H, Pfeifle M (2005) Density-based clustering of uncertain data. In: Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining, pp 672–677
41.
Zurück zum Zitat Chen Y, Tu L (2007) Density-based clustering for real-time stream data. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, pp 133–142 Chen Y, Tu L (2007) Density-based clustering for real-time stream data. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, pp 133–142
42.
Zurück zum Zitat Duan L, Xu L, Guo F, Lee J, Yan B (2007) A local-density based spatial clustering algorithm with noise. Inf Syst 32:978–986CrossRef Duan L, Xu L, Guo F, Lee J, Yan B (2007) A local-density based spatial clustering algorithm with noise. Inf Syst 32:978–986CrossRef
43.
Zurück zum Zitat Hinneburg A, Keim D (1998) An efficient approach to clustering in large multimedia databases with noise. In Proceedings of the 4th ACM SIGKDD international conference on knowledge discovery and data mining 98: 58–65 Hinneburg A, Keim D (1998) An efficient approach to clustering in large multimedia databases with noise. In Proceedings of the 4th ACM SIGKDD international conference on knowledge discovery and data mining 98: 58–65
44.
Zurück zum Zitat Sharan R, Shamir R (2000) CLICK: a clustering algorithm with applications to gene expression analysis. In: Proc international conference intelligent systems molecular biolgy, pp 307–316 Sharan R, Shamir R (2000) CLICK: a clustering algorithm with applications to gene expression analysis. In: Proc international conference intelligent systems molecular biolgy, pp 307–316
45.
Zurück zum Zitat Jain A, Murty M, Flynn P (1999) Data clustering: a review. ACM Comput Surv (CSUR) 31:264–323CrossRef Jain A, Murty M, Flynn P (1999) Data clustering: a review. ACM Comput Surv (CSUR) 31:264–323CrossRef
46.
Zurück zum Zitat Ben-Dor A, Shamir R, Yakhini Z (1999) Clustering gene expression patterns. J Comput Biol 6:281–297PubMedCrossRef Ben-Dor A, Shamir R, Yakhini Z (1999) Clustering gene expression patterns. J Comput Biol 6:281–297PubMedCrossRef
48.
Zurück zum Zitat Estivill-Castro V, Lee I (2000) Amoeba: hierarchical clustering based on spatial proximity using delaunay diagram. In: Proceedings of the 9th international symposium on spatial data handling, Beijing Estivill-Castro V, Lee I (2000) Amoeba: hierarchical clustering based on spatial proximity using delaunay diagram. In: Proceedings of the 9th international symposium on spatial data handling, Beijing
49.
Zurück zum Zitat Cherng J, Lo M (2001) A hypergraph based clustering algorithm for spatial data sets. In: Proceedings of the 2001 IEEE international conference on data mining, pp 83–90 Cherng J, Lo M (2001) A hypergraph based clustering algorithm for spatial data sets. In: Proceedings of the 2001 IEEE international conference on data mining, pp 83–90
50.
Zurück zum Zitat Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22:888–905CrossRef Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22:888–905CrossRef
51.
Zurück zum Zitat Ng A, Jordan M, Weiss Y (2002) On spectral clustering: analysis and an algorithm. Adv Neural Inf Process Syst 2:849–856 Ng A, Jordan M, Weiss Y (2002) On spectral clustering: analysis and an algorithm. Adv Neural Inf Process Syst 2:849–856
52.
Zurück zum Zitat Wang W, Yang J, Muntz R (1997) STING: a statistical information grid approach to spatial data mining. In VLDB, pp 186–195 Wang W, Yang J, Muntz R (1997) STING: a statistical information grid approach to spatial data mining. In VLDB, pp 186–195
53.
Zurück zum Zitat Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings 1998 ACM sigmod international conference on management of data, vol 27, pp 94–105 Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings 1998 ACM sigmod international conference on management of data, vol 27, pp 94–105
54.
Zurück zum Zitat Sheikholeslami G, Chatterjee S, Zhang A (1998) Wavecluster: A multi-resolution clustering approach for very large spatial databases. In: VLDB, pp 428–439 Sheikholeslami G, Chatterjee S, Zhang A (1998) Wavecluster: A multi-resolution clustering approach for very large spatial databases. In: VLDB, pp 428–439
55.
Zurück zum Zitat Ma E, Chow T (2004) A new shifting grid clustering algorithm. Pattern Recognit 37:503–514MATHCrossRef Ma E, Chow T (2004) A new shifting grid clustering algorithm. Pattern Recognit 37:503–514MATHCrossRef
56.
Zurück zum Zitat Park N, Lee W (2004) Statistical grid-based clustering over data streams. ACM SIGMOD Rec 33:32–37CrossRef Park N, Lee W (2004) Statistical grid-based clustering over data streams. ACM SIGMOD Rec 33:32–37CrossRef
57.
Zurück zum Zitat Pilevar A, Sukumar M (2005) GCHL: a grid-clustering algorithm for high-dimensional very large spatial data bases. Pattern Recognit Lett 26:999–1010CrossRef Pilevar A, Sukumar M (2005) GCHL: a grid-clustering algorithm for high-dimensional very large spatial data bases. Pattern Recognit Lett 26:999–1010CrossRef
58.
Zurück zum Zitat Mandelbrot B (1983) The fractal geometry of nature. Macmillan, London Mandelbrot B (1983) The fractal geometry of nature. Macmillan, London
59.
Zurück zum Zitat Barbará D, Chen P (2000) Using the fractal dimension to cluster datasets. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining, pp 260–264 Barbará D, Chen P (2000) Using the fractal dimension to cluster datasets. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining, pp 260–264
60.
Zurück zum Zitat Zhang A, Cheng B, Acharya R (1996) A fractal-based clustering approach in large visual database systems. In Representation and retrieval of visual media in, multimedia systems, pp 49–68 Zhang A, Cheng B, Acharya R (1996) A fractal-based clustering approach in large visual database systems. In Representation and retrieval of visual media in, multimedia systems, pp 49–68
61.
Zurück zum Zitat Menascé D, Abrahao B, Barbará D, Almeida V, Ribeiro F (2002) Fractal characterization of web workloads. In: Proceedings of the “ Web Engineering” Track of WWW2002, pp 7–11 Menascé D, Abrahao B, Barbará D, Almeida V, Ribeiro F (2002) Fractal characterization of web workloads. In: Proceedings of the “ Web Engineering” Track of WWW2002, pp 7–11
62.
Zurück zum Zitat Barry R, Kinsner W (2004) Multifractal characterization for classification of network traffic. Conf Electr Comput Eng 3:1453–1457 Barry R, Kinsner W (2004) Multifractal characterization for classification of network traffic. Conf Electr Comput Eng 3:1453–1457
63.
Zurück zum Zitat Al-Shammary D, Khalil I, Tari Z (2014) A distributed aggregation and fast fractal clustering approach for SOAP traffic. J Netw Comput Appl 41:1–14CrossRef Al-Shammary D, Khalil I, Tari Z (2014) A distributed aggregation and fast fractal clustering approach for SOAP traffic. J Netw Comput Appl 41:1–14CrossRef
64.
Zurück zum Zitat Fisher D (1987) Knowledge acquisition via incremental conceptual clustering. Mach Learn 2:139–172ADS Fisher D (1987) Knowledge acquisition via incremental conceptual clustering. Mach Learn 2:139–172ADS
65.
Zurück zum Zitat KohonenKohonen T (1990) The self-organizing map. Proc IEEE 78:1464–1480CrossRef KohonenKohonen T (1990) The self-organizing map. Proc IEEE 78:1464–1480CrossRef
66.
Zurück zum Zitat Carpenter G, Grossberg S (1987) A massively parallel architecture for a self-organizing neural pattern recognition machine. Comput Vis Gr Image Process 37:54–115MATHCrossRef Carpenter G, Grossberg S (1987) A massively parallel architecture for a self-organizing neural pattern recognition machine. Comput Vis Gr Image Process 37:54–115MATHCrossRef
67.
Zurück zum Zitat Carpenter G, Grossberg S (1988) The ART of adaptive pattern recognition by a self-organizing neural network. Computer 21:77–88CrossRef Carpenter G, Grossberg S (1988) The ART of adaptive pattern recognition by a self-organizing neural network. Computer 21:77–88CrossRef
68.
Zurück zum Zitat Carpenter G, Grossberg S (1987) ART 2: self-organization of stable category recognition codes for analog input patterns. Appl Opt 26:4919–4930PubMedADSCrossRef Carpenter G, Grossberg S (1987) ART 2: self-organization of stable category recognition codes for analog input patterns. Appl Opt 26:4919–4930PubMedADSCrossRef
69.
Zurück zum Zitat Carpenter G, Grossberg S (1990) ART 3: hierarchical search using chemical transmitters in self-organizing pattern recognition architectures. Neural Netw 3:129–152CrossRef Carpenter G, Grossberg S (1990) ART 3: hierarchical search using chemical transmitters in self-organizing pattern recognition architectures. Neural Netw 3:129–152CrossRef
70.
Zurück zum Zitat Meilă M, Heckerman D (2001) An experimental comparison of model-based clustering methods. Mach Learn 42:9–29MATHCrossRef Meilă M, Heckerman D (2001) An experimental comparison of model-based clustering methods. Mach Learn 42:9–29MATHCrossRef
71.
Zurück zum Zitat Fraley C, Raftery A (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97:611–631MATHMathSciNetCrossRef Fraley C, Raftery A (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97:611–631MATHMathSciNetCrossRef
72.
Zurück zum Zitat McLachlan G, Bean R, Peel D (2002) A mixture model-based approach to the clustering of microarray expression data. Bioinformatics 18:413–422PubMedCrossRef McLachlan G, Bean R, Peel D (2002) A mixture model-based approach to the clustering of microarray expression data. Bioinformatics 18:413–422PubMedCrossRef
73.
Zurück zum Zitat Medvedovic M, Sivaganesan S (2002) Bayesian infinite mixture model based clustering of gene expression profiles. Bioinformatics 18:1194–1206PubMedCrossRef Medvedovic M, Sivaganesan S (2002) Bayesian infinite mixture model based clustering of gene expression profiles. Bioinformatics 18:1194–1206PubMedCrossRef
74.
Zurück zum Zitat Zhong S, Ghosh J (2003) A unified framework for model-based clustering. J Mach Learn Res 4:1001–1037MathSciNet Zhong S, Ghosh J (2003) A unified framework for model-based clustering. J Mach Learn Res 4:1001–1037MathSciNet
75.
Zurück zum Zitat McNicholas P, Murphy T (2010) Model-based clustering of microarray expression data via latent Gaussian mixture models. Bioinformatics 26:2705–2712PubMedCrossRef McNicholas P, Murphy T (2010) Model-based clustering of microarray expression data via latent Gaussian mixture models. Bioinformatics 26:2705–2712PubMedCrossRef
76.
Zurück zum Zitat Schölkopf B, Smola A, Müller K (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10:1299–1319CrossRef Schölkopf B, Smola A, Müller K (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10:1299–1319CrossRef
77.
Zurück zum Zitat MacDonald D, Fyfe C (2000) The kernel self-organising map. Proc Fourth Int Conf Knowl-Based Intell Eng Syst Allied Technol 1:317–320 MacDonald D, Fyfe C (2000) The kernel self-organising map. Proc Fourth Int Conf Knowl-Based Intell Eng Syst Allied Technol 1:317–320
78.
Zurück zum Zitat Wu Z, Xie W,Yu J (2003) Fuzzy c-means clustering algorithm based on kernel method. In: Proceedings of the fifth ICCIMA, pp 49–54 Wu Z, Xie W,Yu J (2003) Fuzzy c-means clustering algorithm based on kernel method. In: Proceedings of the fifth ICCIMA, pp 49–54
79.
Zurück zum Zitat Ben-Hur A, Horn D, Siegelmann H, Vapnik V (2002) Support vector clustering. J Mach Learn Res 2:125–137MATH Ben-Hur A, Horn D, Siegelmann H, Vapnik V (2002) Support vector clustering. J Mach Learn Res 2:125–137MATH
80.
Zurück zum Zitat Xu L, Neufeld J, Larson B, Schuurmans D (2004) Maximum margin clustering. In: Advances in neural information processing systems, pp 1537–1544 Xu L, Neufeld J, Larson B, Schuurmans D (2004) Maximum margin clustering. In: Advances in neural information processing systems, pp 1537–1544
81.
Zurück zum Zitat Zhao B, Kwok J, Zhang C (2009) Multiple kernel clustering. In SDM, pp 638–649 Zhao B, Kwok J, Zhang C (2009) Multiple kernel clustering. In SDM, pp 638–649
82.
Zurück zum Zitat Müller K, Mika S, Rätsch G, Tsuda K, Schölkopf B (2001) An introduction to kernel-based learning algorithms. IEEE Trans Neural Netw 12:181–201PubMedCrossRef Müller K, Mika S, Rätsch G, Tsuda K, Schölkopf B (2001) An introduction to kernel-based learning algorithms. IEEE Trans Neural Netw 12:181–201PubMedCrossRef
83.
Zurück zum Zitat Girolami M (2002) Mercer kernel-based clustering in feature space. IEEE Trans Neural Netw 13:780–784PubMedCrossRef Girolami M (2002) Mercer kernel-based clustering in feature space. IEEE Trans Neural Netw 13:780–784PubMedCrossRef
84.
Zurück zum Zitat Filippone M, Camastra F, Masulli F, Rovetta S (2008) A survey of kernel and spectral methods for clustering. Pattern Recognit 41:176–190MATHCrossRef Filippone M, Camastra F, Masulli F, Rovetta S (2008) A survey of kernel and spectral methods for clustering. Pattern Recognit 41:176–190MATHCrossRef
85.
Zurück zum Zitat Fred A, Jain A (2005) Combining multiple clusterings using evidence accumulation. IEEE Trans Pattern Anal Mach Intell 27:835–850PubMedCrossRef Fred A, Jain A (2005) Combining multiple clusterings using evidence accumulation. IEEE Trans Pattern Anal Mach Intell 27:835–850PubMedCrossRef
86.
Zurück zum Zitat Strehl A, Ghosh J (2003) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617MATHMathSciNet Strehl A, Ghosh J (2003) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617MATHMathSciNet
87.
Zurück zum Zitat Fern X, Brodley C (2003) Random projection for high dimensional data clustering: a cluster ensemble approach. ICML 3:186–193 Fern X, Brodley C (2003) Random projection for high dimensional data clustering: a cluster ensemble approach. ICML 3:186–193
88.
Zurück zum Zitat Dimitriadou E, Weingessel A, Hornik K (2001) Voting-merging: an ensemble method for clustering. In: ICANN, pp 217–224 Dimitriadou E, Weingessel A, Hornik K (2001) Voting-merging: an ensemble method for clustering. In: ICANN, pp 217–224
89.
Zurück zum Zitat Topchy A, Jain A, Punch W (2004) A mixture model for clustering ensembles. In: Proceedings of the SIAM international conference on data mining, pp 379 Topchy A, Jain A, Punch W (2004) A mixture model for clustering ensembles. In: Proceedings of the SIAM international conference on data mining, pp 379
90.
Zurück zum Zitat Topchy A, Jain A, Punch W (2005) Clustering ensembles: models of consensus and weak partitions. IEEE Trans Pattern Anal Mach Intell 27:1866–1881PubMedCrossRef Topchy A, Jain A, Punch W (2005) Clustering ensembles: models of consensus and weak partitions. IEEE Trans Pattern Anal Mach Intell 27:1866–1881PubMedCrossRef
91.
Zurück zum Zitat Yoon H, Ahn S, Lee S, Cho S, Kim J (2006) Heterogeneous clustering ensemble method for combining different cluster results. In: Data mining for biomedical applications, pp 82–92 Yoon H, Ahn S, Lee S, Cho S, Kim J (2006) Heterogeneous clustering ensemble method for combining different cluster results. In: Data mining for biomedical applications, pp 82–92
92.
Zurück zum Zitat Domeniconi C, Gunopulos D, Ma S, Yan B, Al-Razgan M, Papadopoulos D (2007) Locally adaptive metrics for clustering high dimensional data. Data Min Knowl Discov 14:63–97MathSciNetCrossRef Domeniconi C, Gunopulos D, Ma S, Yan B, Al-Razgan M, Papadopoulos D (2007) Locally adaptive metrics for clustering high dimensional data. Data Min Knowl Discov 14:63–97MathSciNetCrossRef
93.
Zurück zum Zitat Vega-Pons S, Correa-Morris J, Ruiz-Shulcloper J (2010) Weighted partition consensus via kernels. Pattern Recognit 43:2712–2724MATHCrossRef Vega-Pons S, Correa-Morris J, Ruiz-Shulcloper J (2010) Weighted partition consensus via kernels. Pattern Recognit 43:2712–2724MATHCrossRef
94.
Zurück zum Zitat Punera K, Ghosh J (2008) Consensus-based ensembles of soft clusterings. Appl Artif Intell 22:780–810CrossRef Punera K, Ghosh J (2008) Consensus-based ensembles of soft clusterings. Appl Artif Intell 22:780–810CrossRef
95.
Zurück zum Zitat Vega-Pons S, Ruiz-Shulcloper J (2011) A survey of clustering ensemble algorithms. Int J Pattern Recognit Artif Intell 25:337–372MathSciNetCrossRef Vega-Pons S, Ruiz-Shulcloper J (2011) A survey of clustering ensemble algorithms. Int J Pattern Recognit Artif Intell 25:337–372MathSciNetCrossRef
96.
Zurück zum Zitat Handl J, Meyer B (2007) Ant-based and swarm-based clustering. Swarm Intell 1:95–113CrossRef Handl J, Meyer B (2007) Ant-based and swarm-based clustering. Swarm Intell 1:95–113CrossRef
97.
Zurück zum Zitat Abraham A, Das S, Roy S (2008) Swarm intelligence algorithms for data clustering. In: Soft computing for knowledge discovery and data mining, pp 279–313 Abraham A, Das S, Roy S (2008) Swarm intelligence algorithms for data clustering. In: Soft computing for knowledge discovery and data mining, pp 279–313
98.
Zurück zum Zitat Van der Merwe D, Engelbrecht A (2003) Data clustering using particle swarm optimization. Congr Evol Comput 1:215–220 Van der Merwe D, Engelbrecht A (2003) Data clustering using particle swarm optimization. Congr Evol Comput 1:215–220
99.
Zurück zum Zitat Amiri B, Fathian M, Maroosi A (2009) Application of shuffled frog-leaping algorithm on clustering. Int J Adv Manuf Technol 45:199–209CrossRef Amiri B, Fathian M, Maroosi A (2009) Application of shuffled frog-leaping algorithm on clustering. Int J Adv Manuf Technol 45:199–209CrossRef
100.
Zurück zum Zitat Karaboga D, Ozturk C (2011) A novel clustering approach: artificial bee colony (ABC) algorithm. Appl Soft Comput 11:652–657CrossRef Karaboga D, Ozturk C (2011) A novel clustering approach: artificial bee colony (ABC) algorithm. Appl Soft Comput 11:652–657CrossRef
101.
Zurück zum Zitat Lumer E, Faieta B (1994) Diversity and adaptation in populations of clustering ants. Proc Third Int Conf Simul Adapt Behav 3:501–508 Lumer E, Faieta B (1994) Diversity and adaptation in populations of clustering ants. Proc Third Int Conf Simul Adapt Behav 3:501–508
102.
Zurück zum Zitat Shelokar P, Jayaraman V, Kulkarni B (2004) An ant colony approach for clustering. Anal Chim Acta 509:187–195CrossRef Shelokar P, Jayaraman V, Kulkarni B (2004) An ant colony approach for clustering. Anal Chim Acta 509:187–195CrossRef
103.
Zurück zum Zitat Karaboga D, Akay B (2009) A survey: algorithms simulating bee swarm intelligence. Artif Intell Rev 31:61–85CrossRef Karaboga D, Akay B (2009) A survey: algorithms simulating bee swarm intelligence. Artif Intell Rev 31:61–85CrossRef
104.
Zurück zum Zitat Xu R, Xu J, Wunsch D (2012) A comparison study of validity indices on swarm-intelligence-based clustering. IEEE Trans Syst Man Cybern Part B 42:1243–1256 Xu R, Xu J, Wunsch D (2012) A comparison study of validity indices on swarm-intelligence-based clustering. IEEE Trans Syst Man Cybern Part B 42:1243–1256
105.
Zurück zum Zitat Horn D, Gottlieb A (2001) Algorithm for data clustering in pattern recognition problems based on quantum mechanics. Phys Rev Lett 88:018702PubMedADSCrossRef Horn D, Gottlieb A (2001) Algorithm for data clustering in pattern recognition problems based on quantum mechanics. Phys Rev Lett 88:018702PubMedADSCrossRef
106.
Zurück zum Zitat Horn D, Gottlieb A (2001) The method of quantum clustering. In: Advances in neural information processing systems, pp 769–776 Horn D, Gottlieb A (2001) The method of quantum clustering. In: Advances in neural information processing systems, pp 769–776
107.
Zurück zum Zitat Weinstein M, Horn D (2009) Dynamic quantum clustering: a method for visual exploration of structures in data. Phys Rev E 80:066117ADSCrossRef Weinstein M, Horn D (2009) Dynamic quantum clustering: a method for visual exploration of structures in data. Phys Rev E 80:066117ADSCrossRef
109.
Zurück zum Zitat Horn D, Axel I (2003) Novel clustering algorithm for microarray expression data in a truncated SVD space. Bioinformatics 19:1110–1115PubMedCrossRef Horn D, Axel I (2003) Novel clustering algorithm for microarray expression data in a truncated SVD space. Bioinformatics 19:1110–1115PubMedCrossRef
110.
Zurück zum Zitat Aïmeur E, Brassard G, Gambs S (2007) Quantum clustering algorithms. In: ICML, pp 1–8 Aïmeur E, Brassard G, Gambs S (2007) Quantum clustering algorithms. In: ICML, pp 1–8
112.
Zurück zum Zitat Yu S, Shi J (2003) Multiclass spectral clustering. In: Proceedings of the ninth IEEE international conference on computer vision, pp 313–319 Yu S, Shi J (2003) Multiclass spectral clustering. In: Proceedings of the ninth IEEE international conference on computer vision, pp 313–319
113.
Zurück zum Zitat Verma D, Meila M (2003) A comparison of spectral clustering algorithms. University of Washington Tech Rep UWCSE030501 1: 1–18 Verma D, Meila M (2003) A comparison of spectral clustering algorithms. University of Washington Tech Rep UWCSE030501 1: 1–18
114.
Zurück zum Zitat Chen W, Song Y, Bai H, Lin C, Chang E (2011) Parallel spectral clustering in distributed systems. IEEE Trans Pattern Anal Mach Intell 33:568–586PubMedCrossRef Chen W, Song Y, Bai H, Lin C, Chang E (2011) Parallel spectral clustering in distributed systems. IEEE Trans Pattern Anal Mach Intell 33:568–586PubMedCrossRef
115.
Zurück zum Zitat Lu Z, Carreira-Perpinan M (2008) Constrained spectral clustering through affinity propagation. In: IEEE conference on computer vision and pattern recognition, pp 1–8 Lu Z, Carreira-Perpinan M (2008) Constrained spectral clustering through affinity propagation. In: IEEE conference on computer vision and pattern recognition, pp 1–8
117.
Zurück zum Zitat Shang F, Jiao L, Shi J, Wang F, Gong M (2012) Fast affinity propagation clustering: a multilevel approach. Pattern Recognit 45:474–486CrossRef Shang F, Jiao L, Shi J, Wang F, Gong M (2012) Fast affinity propagation clustering: a multilevel approach. Pattern Recognit 45:474–486CrossRef
118.
119.
Zurück zum Zitat Ng R, Han J (1994) Efficient and effective clustering methods for spatial data mining. In: VLDB, pp 144–155 Ng R, Han J (1994) Efficient and effective clustering methods for spatial data mining. In: VLDB, pp 144–155
120.
Zurück zum Zitat Sander J, Ester M, Kriegel H, Xu X (1998) Density-based clustering in spatial databases: the algorithm gdbscan and its applications. Data Min Knowl Discov 2:169–194CrossRef Sander J, Ester M, Kriegel H, Xu X (1998) Density-based clustering in spatial databases: the algorithm gdbscan and its applications. Data Min Knowl Discov 2:169–194CrossRef
121.
Zurück zum Zitat Harel D, Koren Y (2001) Clustering spatial data using random walks. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining, pp 281–286 Harel D, Koren Y (2001) Clustering spatial data using random walks. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining, pp 281–286
122.
Zurück zum Zitat Zaïane O, Lee C (2002) Clustering spatial data when facing physical constraints. In: Proceedings of the IEEE international conference on data mining, pp 737–740 Zaïane O, Lee C (2002) Clustering spatial data when facing physical constraints. In: Proceedings of the IEEE international conference on data mining, pp 737–740
123.
Zurück zum Zitat Birant D, Kut A (2007) ST-DBSCAN: an algorithm for clustering spatial-temporal data. Data Knowl Eng 60:208–221CrossRef Birant D, Kut A (2007) ST-DBSCAN: an algorithm for clustering spatial-temporal data. Data Knowl Eng 60:208–221CrossRef
124.
Zurück zum Zitat O’callaghan L, Meyerson A, Motwani R, Mishra N, Guha S (2002) Streaming-data algorithms for high-quality clustering. In: ICDE, p 0685 O’callaghan L, Meyerson A, Motwani R, Mishra N, Guha S (2002) Streaming-data algorithms for high-quality clustering. In: ICDE, p 0685
125.
Zurück zum Zitat Aggarwal C, Han J, Wang J, Yu P (2003) A framework for clustering evolving data streams. In: VLDB, pp 81–92 Aggarwal C, Han J, Wang J, Yu P (2003) A framework for clustering evolving data streams. In: VLDB, pp 81–92
126.
Zurück zum Zitat Aggarwal C, Han J, Wang J, Yu P (2004) A framework for projected clustering of high dimensional data streams. In: VLDB, pp 852–863 Aggarwal C, Han J, Wang J, Yu P (2004) A framework for projected clustering of high dimensional data streams. In: VLDB, pp 852–863
127.
Zurück zum Zitat Cao F, Ester M, Qian W, Zhou A (2006) Density-based clustering over an evolving data stream with noise. SDM 6:328–339MathSciNet Cao F, Ester M, Qian W, Zhou A (2006) Density-based clustering over an evolving data stream with noise. SDM 6:328–339MathSciNet
128.
Zurück zum Zitat Guha S, Mishra N, Motwani R, O’Callaghan L (2000) Clustering data streams. In: Proceedings of the 41st annual symposium on foundations of computer science, pp 359–366 Guha S, Mishra N, Motwani R, O’Callaghan L (2000) Clustering data streams. In: Proceedings of the 41st annual symposium on foundations of computer science, pp 359–366
129.
Zurück zum Zitat Barbará D (2002) Requirements for clustering data streams. ACM SIGKDD Explor Newsl 3:23–27CrossRef Barbará D (2002) Requirements for clustering data streams. ACM SIGKDD Explor Newsl 3:23–27CrossRef
130.
Zurück zum Zitat Guha S, Meyerson A, Mishra N, Motwani R, O’Callaghan L (2003) Clustering data streams: theory and practice. IEEE Trans Knowl Data Eng 15:515–528CrossRef Guha S, Meyerson A, Mishra N, Motwani R, O’Callaghan L (2003) Clustering data streams: theory and practice. IEEE Trans Knowl Data Eng 15:515–528CrossRef
131.
Zurück zum Zitat Beringer J, Hüllermeier E (2006) Online clustering of parallel data streams. Data Knowl Eng 58:180–204CrossRef Beringer J, Hüllermeier E (2006) Online clustering of parallel data streams. Data Knowl Eng 58:180–204CrossRef
132.
Zurück zum Zitat Silva J, Faria E, Barros R, Hruschka E, de Carvalho A, Gama J (2013) Data stream clustering: a survey. ACM Comput Surv 46:13CrossRef Silva J, Faria E, Barros R, Hruschka E, de Carvalho A, Gama J (2013) Data stream clustering: a survey. ACM Comput Surv 46:13CrossRef
133.
Zurück zum Zitat Leskovec J, Rajaraman A, Ullman JD (2014) Mining massive datasets. Cambridge University Press, CambridgeCrossRef Leskovec J, Rajaraman A, Ullman JD (2014) Mining massive datasets. Cambridge University Press, CambridgeCrossRef
134.
Zurück zum Zitat Steinbach M, Karypis G, Kumar V (2000) A comparison of document clustering techniques. KDD Workshop Text Min 400:525–526 Steinbach M, Karypis G, Kumar V (2000) A comparison of document clustering techniques. KDD Workshop Text Min 400:525–526
135.
Zurück zum Zitat Parsons L, Haque E, Liu H (2004) Subspace clustering for high dimensional data: a review. ACM SIGKDD Explor Newsl 6:90–105CrossRef Parsons L, Haque E, Liu H (2004) Subspace clustering for high dimensional data: a review. ACM SIGKDD Explor Newsl 6:90–105CrossRef
136.
Zurück zum Zitat Kriegel H, Kröger P, Zimek A (2009) Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans Knowl Discov Data 3:1CrossRef Kriegel H, Kröger P, Zimek A (2009) Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans Knowl Discov Data 3:1CrossRef
137.
Zurück zum Zitat Judd D, McKinley P, Jain A (1996) Large-scale parallel data clustering. In: Proceedings of the 13th international conference on pattern recognition, vol 4, pp 488–493 Judd D, McKinley P, Jain A (1996) Large-scale parallel data clustering. In: Proceedings of the 13th international conference on pattern recognition, vol 4, pp 488–493
138.
Zurück zum Zitat Tasoulis D, Vrahatis M (2004) Unsupervised distributed clustering. In: Parallel and distributed computing and networks, pp 347–351 Tasoulis D, Vrahatis M (2004) Unsupervised distributed clustering. In: Parallel and distributed computing and networks, pp 347–351
139.
Zurück zum Zitat Zhao W, Ma H, He Q (2009) Parallel k-means clustering based on mapreduce. In: Cloud computing, pp 674–679 Zhao W, Ma H, He Q (2009) Parallel k-means clustering based on mapreduce. In: Cloud computing, pp 674–679
140.
Zurück zum Zitat Herwig R, Poustka A, Müller C, Bull C, Lehrach H, O’Brien J (1999) Large-scale clustering of cDNA-fingerprinting data. Genome Res 9:1093–1105PubMedCentralPubMedCrossRef Herwig R, Poustka A, Müller C, Bull C, Lehrach H, O’Brien J (1999) Large-scale clustering of cDNA-fingerprinting data. Genome Res 9:1093–1105PubMedCentralPubMedCrossRef
141.
Zurück zum Zitat Hinneburg A, Keim D (2003) A general approach to clustering in large databases with noise. Knowl Inf Syst 5:387–415CrossRef Hinneburg A, Keim D (2003) A general approach to clustering in large databases with noise. Knowl Inf Syst 5:387–415CrossRef
Metadaten
Titel
A Comprehensive Survey of Clustering Algorithms
verfasst von
Dongkuan Xu
Yingjie Tian
Publikationsdatum
01.06.2015
Verlag
Springer Berlin Heidelberg
Erschienen in
Annals of Data Science / Ausgabe 2/2015
Print ISSN: 2198-5804
Elektronische ISSN: 2198-5812
DOI
https://doi.org/10.1007/s40745-015-0040-1

Weitere Artikel der Ausgabe 2/2015

Annals of Data Science 2/2015 Zur Ausgabe

Premium Partner