Skip to main content
Erschienen in: Neural Computing and Applications 1/2018

14.11.2016 | Original Article

Cluster merging based on a decision threshold

verfasst von: Jian Hou, Boping Zhang

Erschienen in: Neural Computing and Applications | Ausgabe 1/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Data clustering is an important unsupervised learning technique and has wide application in various fields including pattern recognition, data mining, image analysis and bioinformatics. A vast amount of clustering algorithms have been proposed in the past decades. However, existing algorithms still face many problems in practical applications. One typical problem is the parameter dependence, which means that user-specified parameters are required as input and the clustering results are influenced by these parameters. Another problem is that many algorithms are not able to generate clusters of non-spherical shapes. In this paper, a cluster merging method is proposed to solve the above-mentioned problems based on a decision threshold and the dominant sets algorithm. Firstly, the influence of similarity parameter on dominant sets clustering results is studied, and it is found that the obtained clusters become larger with the increase in similarity parameter. We analyze the reason behind this behavior and propose to generate small initial clusters in the first step and then merge the initial clusters to improve the clustering results. Specifically, we select a similarity parameter which generates small but not too small clusters. Then, we calculate pairwise merging decisions among the initial clusters and obtain a merging decision threshold. Based on this threshold, we merge the small clusters and obtain the final clustering results. Experiments on several datasets are used to validate the effectiveness of the proposed algorithm.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Agrawal R, Gehrke J, Gunopulos D, Raghavan P (2005) Automatic subspace clustering of high dimensional data. In: International conference on knowledge discovery and data mining, pp 517–521 Agrawal R, Gehrke J, Gunopulos D, Raghavan P (2005) Automatic subspace clustering of high dimensional data. In: International conference on knowledge discovery and data mining, pp 517–521
4.
Zurück zum Zitat Bulo SR, Pelillo M, Bomze IM (2011) Graph-based quadratic optimization: a fast evolutionary approach. Comput Vis Image Underst 115(7):984–995CrossRef Bulo SR, Pelillo M, Bomze IM (2011) Graph-based quadratic optimization: a fast evolutionary approach. Comput Vis Image Underst 115(7):984–995CrossRef
5.
Zurück zum Zitat Bulo SR, Torsello A, Pelillo M (2009) A game-theoretic approach to partial clique enumeration. Image Vis Comput 27(7):911–922CrossRefMATH Bulo SR, Torsello A, Pelillo M (2009) A game-theoretic approach to partial clique enumeration. Image Vis Comput 27(7):911–922CrossRefMATH
6.
Zurück zum Zitat Couprie C, Grady L, Najman L, Talbot H (2009) Power watersheds: a new image segmentation framework extending graph cuts, random walker and optimal spanning forest. In: IEEE international conference on computer vision, pp 731–738 Couprie C, Grady L, Najman L, Talbot H (2009) Power watersheds: a new image segmentation framework extending graph cuts, random walker and optimal spanning forest. In: IEEE international conference on computer vision, pp 731–738
7.
Zurück zum Zitat Daszykowski M, Walczak B, Massart DL (2001) Looking for natural patterns in data: part 1. Density-based approach. Chemom Intell Lab Syst 56(2):83–92CrossRef Daszykowski M, Walczak B, Massart DL (2001) Looking for natural patterns in data: part 1. Density-based approach. Chemom Intell Lab Syst 56(2):83–92CrossRef
8.
Zurück zum Zitat Demaine ED, Emanuel D, Fiat A, Immorlica N (2006) Correlation clustering in general weighted graphs. Theor Comput Sci 361:172–187MathSciNetCrossRefMATH Demaine ED, Emanuel D, Fiat A, Immorlica N (2006) Correlation clustering in general weighted graphs. Theor Comput Sci 361:172–187MathSciNetCrossRefMATH
9.
Zurück zum Zitat Ester M, Kriegel HP, Sander J, Xu XW (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: International conference on knowledge discovery and data mining, pp 226–231 Ester M, Kriegel HP, Sander J, Xu XW (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: International conference on knowledge discovery and data mining, pp 226–231
10.
Zurück zum Zitat Evanno G, Regnaut S, Goudet J (2005) Detecting the number of clusters of individuals using the software structure: a simulation study. Mol Ecol 14(8):2611–2620CrossRef Evanno G, Regnaut S, Goudet J (2005) Detecting the number of clusters of individuals using the software structure: a simulation study. Mol Ecol 14(8):2611–2620CrossRef
11.
Zurück zum Zitat Fowlkes C, Belongie S, Fan C, Malik J (2004) Spectral grouping using the nystrom method. IEEE Trans Pattern Anal Mach Intell 26(2):214–225CrossRef Fowlkes C, Belongie S, Fan C, Malik J (2004) Spectral grouping using the nystrom method. IEEE Trans Pattern Anal Mach Intell 26(2):214–225CrossRef
12.
Zurück zum Zitat Fraley C, Raftery AE (1998) How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput J 41(8):578–588CrossRefMATH Fraley C, Raftery AE (1998) How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput J 41(8):578–588CrossRefMATH
13.
Zurück zum Zitat Frommlet F (2010) Tag snp selection based on clustering according to dominant sets found using replicator dynamics. Adv Data Anal Classif 4:65–83MathSciNetCrossRefMATH Frommlet F (2010) Tag snp selection based on clustering according to dominant sets found using replicator dynamics. Adv Data Anal Classif 4:65–83MathSciNetCrossRefMATH
14.
Zurück zum Zitat Fu L, Medico E (2007) Flame, a novel fuzzy clustering method for the analysis of DNA microarray data. BMC Bioinform 8(1):1–17CrossRef Fu L, Medico E (2007) Flame, a novel fuzzy clustering method for the analysis of DNA microarray data. BMC Bioinform 8(1):1–17CrossRef
15.
Zurück zum Zitat Gionis A, Mannila H, Tsaparas P (2007) Clustering aggregation. ACM Trans Knowl Discov Data 1(1):1–30CrossRef Gionis A, Mannila H, Tsaparas P (2007) Clustering aggregation. ACM Trans Knowl Discov Data 1(1):1–30CrossRef
16.
Zurück zum Zitat Gong M, Liang Y, Shi J, Ma W, Ma J (2013) Fuzzy c-means clustering with local information and kernel metric for image segmentation. IEEE Trans Image Process 22(2):573–584MathSciNetCrossRefMATH Gong M, Liang Y, Shi J, Ma W, Ma J (2013) Fuzzy c-means clustering with local information and kernel metric for image segmentation. IEEE Trans Image Process 22(2):573–584MathSciNetCrossRefMATH
17.
Zurück zum Zitat Hamid R, Maddi S, Johnson AY, Bobick AF, Essa IA, Isbell C (2009) A novel sequence representation for unsupervised analysis of human activities. Artif Intell 173:1221–1244MathSciNetCrossRef Hamid R, Maddi S, Johnson AY, Bobick AF, Essa IA, Isbell C (2009) A novel sequence representation for unsupervised analysis of human activities. Artif Intell 173:1221–1244MathSciNetCrossRef
18.
Zurück zum Zitat Hou J, Gao H, Li X (2016) Dsets-dbscan: a parameter-free clustering algorithm. IEEE Trans Image Process 25(7):3182–3193MathSciNetCrossRef Hou J, Gao H, Li X (2016) Dsets-dbscan: a parameter-free clustering algorithm. IEEE Trans Image Process 25(7):3182–3193MathSciNetCrossRef
19.
Zurück zum Zitat Hou J, Liu W, Xu E, Cui H (2016) Towards parameter-independent data clustering and image segmentation. Pattern Recognit 60:25–36CrossRef Hou J, Liu W, Xu E, Cui H (2016) Towards parameter-independent data clustering and image segmentation. Pattern Recognit 60:25–36CrossRef
20.
Zurück zum Zitat Hou J, Qi X, Qi NM (2015) Experimental study on dominant sets clustering. IET Comput Vis 9(2):208–215CrossRef Hou J, Qi X, Qi NM (2015) Experimental study on dominant sets clustering. IET Comput Vis 9(2):208–215CrossRef
21.
Zurück zum Zitat Hou J, Sha C, Cui H, Chi L (2016) Dominant set based data clustering and image segmentation. In: International conference on multimedia modeling, pp 432–443 Hou J, Sha C, Cui H, Chi L (2016) Dominant set based data clustering and image segmentation. In: International conference on multimedia modeling, pp 432–443
22.
Zurück zum Zitat Jain AK, Law MHC (2005) Data clustering: a user’s dilemma. In: International conference on pattern recognition and machine intelligence, pp 1–10 Jain AK, Law MHC (2005) Data clustering: a user’s dilemma. In: International conference on pattern recognition and machine intelligence, pp 1–10
23.
Zurück zum Zitat Monti S, Tamayo P, Mesirov J, Golub T (2003) Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach Learn 52(1–2):91–118CrossRefMATH Monti S, Tamayo P, Mesirov J, Golub T (2003) Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach Learn 52(1–2):91–118CrossRefMATH
24.
Zurück zum Zitat Niu D, Dy JG, Jordan MI (2011) Dimensionality reduction for spectral clustering. In: International conference on artificial intelligence and statistics, pp 552–560 Niu D, Dy JG, Jordan MI (2011) Dimensionality reduction for spectral clustering. In: International conference on artificial intelligence and statistics, pp 552–560
25.
Zurück zum Zitat Pavan M, Pelillo M (2003) A graph-theoretic approach to clustering and segmentation. In: IEEE international conference on computer vision and pattern recognition, pp 145–152 Pavan M, Pelillo M (2003) A graph-theoretic approach to clustering and segmentation. In: IEEE international conference on computer vision and pattern recognition, pp 145–152
26.
Zurück zum Zitat Pavan M, Pelillo M (2007) Dominant sets and pairwise clustering. IEEE Trans Pattern Anal Mach Intell 29(1):167–172CrossRef Pavan M, Pelillo M (2007) Dominant sets and pairwise clustering. IEEE Trans Pattern Anal Mach Intell 29(1):167–172CrossRef
27.
Zurück zum Zitat Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344:1492–1496CrossRef Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344:1492–1496CrossRef
28.
Zurück zum Zitat Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):167–172 Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):167–172
29.
Zurück zum Zitat Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a data set via the gap statistic. J R Stat Soc Ser B Stat Methodol 63(2):411–423MathSciNetCrossRefMATH Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a data set via the gap statistic. J R Stat Soc Ser B Stat Methodol 63(2):411–423MathSciNetCrossRefMATH
30.
Zurück zum Zitat Torsello A, Bulo SR, Pelillo M (2008) Beyond partitions: allowing overlapping groups in pairwise clustering. In: International conference on pattern recognition, pp 1–4 Torsello A, Bulo SR, Pelillo M (2008) Beyond partitions: allowing overlapping groups in pairwise clustering. In: International conference on pattern recognition, pp 1–4
31.
Zurück zum Zitat Veenman CJ, Reinders M, Backer E (2002) A maximum variance cluster algorithm. IEEE Trans Pattern Anal Mach Intell 24(9):1273–1280CrossRef Veenman CJ, Reinders M, Backer E (2002) A maximum variance cluster algorithm. IEEE Trans Pattern Anal Mach Intell 24(9):1273–1280CrossRef
32.
Zurück zum Zitat Yan D, Huang L, Jordan MI (2009) Fast approximate spectral clustering. In: International conference on knowledge discovery and data mining, pp 907–916 Yan D, Huang L, Jordan MI (2009) Fast approximate spectral clustering. In: International conference on knowledge discovery and data mining, pp 907–916
33.
Zurück zum Zitat Yang XW, Liu HR, Laecki LJ (2012) Contour-based object detection as dominant set computation. Pattern Recognit 45:1927–1936CrossRef Yang XW, Liu HR, Laecki LJ (2012) Contour-based object detection as dominant set computation. Pattern Recognit 45:1927–1936CrossRef
34.
Zurück zum Zitat Zhu X, Loy CC, Gong S (2014) Constructing robust affinity graphs for spectral clustering. In: IEEE international conference on computer vision and pattern recognition, pp 1450–1457 Zhu X, Loy CC, Gong S (2014) Constructing robust affinity graphs for spectral clustering. In: IEEE international conference on computer vision and pattern recognition, pp 1450–1457
Metadaten
Titel
Cluster merging based on a decision threshold
verfasst von
Jian Hou
Boping Zhang
Publikationsdatum
14.11.2016
Verlag
Springer London
Erschienen in
Neural Computing and Applications / Ausgabe 1/2018
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-016-2699-4

Weitere Artikel der Ausgabe 1/2018

Neural Computing and Applications 1/2018 Zur Ausgabe