Skip to main content
Top
Published in: Neural Computing and Applications 1/2018

14-11-2016 | Original Article

Cluster merging based on a decision threshold

Authors: Jian Hou, Boping Zhang

Published in: Neural Computing and Applications | Issue 1/2018

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Data clustering is an important unsupervised learning technique and has wide application in various fields including pattern recognition, data mining, image analysis and bioinformatics. A vast amount of clustering algorithms have been proposed in the past decades. However, existing algorithms still face many problems in practical applications. One typical problem is the parameter dependence, which means that user-specified parameters are required as input and the clustering results are influenced by these parameters. Another problem is that many algorithms are not able to generate clusters of non-spherical shapes. In this paper, a cluster merging method is proposed to solve the above-mentioned problems based on a decision threshold and the dominant sets algorithm. Firstly, the influence of similarity parameter on dominant sets clustering results is studied, and it is found that the obtained clusters become larger with the increase in similarity parameter. We analyze the reason behind this behavior and propose to generate small initial clusters in the first step and then merge the initial clusters to improve the clustering results. Specifically, we select a similarity parameter which generates small but not too small clusters. Then, we calculate pairwise merging decisions among the initial clusters and obtain a merging decision threshold. Based on this threshold, we merge the small clusters and obtain the final clustering results. Experiments on several datasets are used to validate the effectiveness of the proposed algorithm.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Agrawal R, Gehrke J, Gunopulos D, Raghavan P (2005) Automatic subspace clustering of high dimensional data. In: International conference on knowledge discovery and data mining, pp 517–521 Agrawal R, Gehrke J, Gunopulos D, Raghavan P (2005) Automatic subspace clustering of high dimensional data. In: International conference on knowledge discovery and data mining, pp 517–521
4.
go back to reference Bulo SR, Pelillo M, Bomze IM (2011) Graph-based quadratic optimization: a fast evolutionary approach. Comput Vis Image Underst 115(7):984–995CrossRef Bulo SR, Pelillo M, Bomze IM (2011) Graph-based quadratic optimization: a fast evolutionary approach. Comput Vis Image Underst 115(7):984–995CrossRef
5.
go back to reference Bulo SR, Torsello A, Pelillo M (2009) A game-theoretic approach to partial clique enumeration. Image Vis Comput 27(7):911–922CrossRefMATH Bulo SR, Torsello A, Pelillo M (2009) A game-theoretic approach to partial clique enumeration. Image Vis Comput 27(7):911–922CrossRefMATH
6.
go back to reference Couprie C, Grady L, Najman L, Talbot H (2009) Power watersheds: a new image segmentation framework extending graph cuts, random walker and optimal spanning forest. In: IEEE international conference on computer vision, pp 731–738 Couprie C, Grady L, Najman L, Talbot H (2009) Power watersheds: a new image segmentation framework extending graph cuts, random walker and optimal spanning forest. In: IEEE international conference on computer vision, pp 731–738
7.
go back to reference Daszykowski M, Walczak B, Massart DL (2001) Looking for natural patterns in data: part 1. Density-based approach. Chemom Intell Lab Syst 56(2):83–92CrossRef Daszykowski M, Walczak B, Massart DL (2001) Looking for natural patterns in data: part 1. Density-based approach. Chemom Intell Lab Syst 56(2):83–92CrossRef
8.
go back to reference Demaine ED, Emanuel D, Fiat A, Immorlica N (2006) Correlation clustering in general weighted graphs. Theor Comput Sci 361:172–187MathSciNetCrossRefMATH Demaine ED, Emanuel D, Fiat A, Immorlica N (2006) Correlation clustering in general weighted graphs. Theor Comput Sci 361:172–187MathSciNetCrossRefMATH
9.
go back to reference Ester M, Kriegel HP, Sander J, Xu XW (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: International conference on knowledge discovery and data mining, pp 226–231 Ester M, Kriegel HP, Sander J, Xu XW (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: International conference on knowledge discovery and data mining, pp 226–231
10.
go back to reference Evanno G, Regnaut S, Goudet J (2005) Detecting the number of clusters of individuals using the software structure: a simulation study. Mol Ecol 14(8):2611–2620CrossRef Evanno G, Regnaut S, Goudet J (2005) Detecting the number of clusters of individuals using the software structure: a simulation study. Mol Ecol 14(8):2611–2620CrossRef
11.
go back to reference Fowlkes C, Belongie S, Fan C, Malik J (2004) Spectral grouping using the nystrom method. IEEE Trans Pattern Anal Mach Intell 26(2):214–225CrossRef Fowlkes C, Belongie S, Fan C, Malik J (2004) Spectral grouping using the nystrom method. IEEE Trans Pattern Anal Mach Intell 26(2):214–225CrossRef
12.
go back to reference Fraley C, Raftery AE (1998) How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput J 41(8):578–588CrossRefMATH Fraley C, Raftery AE (1998) How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput J 41(8):578–588CrossRefMATH
13.
go back to reference Frommlet F (2010) Tag snp selection based on clustering according to dominant sets found using replicator dynamics. Adv Data Anal Classif 4:65–83MathSciNetCrossRefMATH Frommlet F (2010) Tag snp selection based on clustering according to dominant sets found using replicator dynamics. Adv Data Anal Classif 4:65–83MathSciNetCrossRefMATH
14.
go back to reference Fu L, Medico E (2007) Flame, a novel fuzzy clustering method for the analysis of DNA microarray data. BMC Bioinform 8(1):1–17CrossRef Fu L, Medico E (2007) Flame, a novel fuzzy clustering method for the analysis of DNA microarray data. BMC Bioinform 8(1):1–17CrossRef
15.
go back to reference Gionis A, Mannila H, Tsaparas P (2007) Clustering aggregation. ACM Trans Knowl Discov Data 1(1):1–30CrossRef Gionis A, Mannila H, Tsaparas P (2007) Clustering aggregation. ACM Trans Knowl Discov Data 1(1):1–30CrossRef
16.
go back to reference Gong M, Liang Y, Shi J, Ma W, Ma J (2013) Fuzzy c-means clustering with local information and kernel metric for image segmentation. IEEE Trans Image Process 22(2):573–584MathSciNetCrossRefMATH Gong M, Liang Y, Shi J, Ma W, Ma J (2013) Fuzzy c-means clustering with local information and kernel metric for image segmentation. IEEE Trans Image Process 22(2):573–584MathSciNetCrossRefMATH
17.
go back to reference Hamid R, Maddi S, Johnson AY, Bobick AF, Essa IA, Isbell C (2009) A novel sequence representation for unsupervised analysis of human activities. Artif Intell 173:1221–1244MathSciNetCrossRef Hamid R, Maddi S, Johnson AY, Bobick AF, Essa IA, Isbell C (2009) A novel sequence representation for unsupervised analysis of human activities. Artif Intell 173:1221–1244MathSciNetCrossRef
18.
go back to reference Hou J, Gao H, Li X (2016) Dsets-dbscan: a parameter-free clustering algorithm. IEEE Trans Image Process 25(7):3182–3193MathSciNetCrossRef Hou J, Gao H, Li X (2016) Dsets-dbscan: a parameter-free clustering algorithm. IEEE Trans Image Process 25(7):3182–3193MathSciNetCrossRef
19.
go back to reference Hou J, Liu W, Xu E, Cui H (2016) Towards parameter-independent data clustering and image segmentation. Pattern Recognit 60:25–36CrossRef Hou J, Liu W, Xu E, Cui H (2016) Towards parameter-independent data clustering and image segmentation. Pattern Recognit 60:25–36CrossRef
20.
go back to reference Hou J, Qi X, Qi NM (2015) Experimental study on dominant sets clustering. IET Comput Vis 9(2):208–215CrossRef Hou J, Qi X, Qi NM (2015) Experimental study on dominant sets clustering. IET Comput Vis 9(2):208–215CrossRef
21.
go back to reference Hou J, Sha C, Cui H, Chi L (2016) Dominant set based data clustering and image segmentation. In: International conference on multimedia modeling, pp 432–443 Hou J, Sha C, Cui H, Chi L (2016) Dominant set based data clustering and image segmentation. In: International conference on multimedia modeling, pp 432–443
22.
go back to reference Jain AK, Law MHC (2005) Data clustering: a user’s dilemma. In: International conference on pattern recognition and machine intelligence, pp 1–10 Jain AK, Law MHC (2005) Data clustering: a user’s dilemma. In: International conference on pattern recognition and machine intelligence, pp 1–10
23.
go back to reference Monti S, Tamayo P, Mesirov J, Golub T (2003) Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach Learn 52(1–2):91–118CrossRefMATH Monti S, Tamayo P, Mesirov J, Golub T (2003) Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach Learn 52(1–2):91–118CrossRefMATH
24.
go back to reference Niu D, Dy JG, Jordan MI (2011) Dimensionality reduction for spectral clustering. In: International conference on artificial intelligence and statistics, pp 552–560 Niu D, Dy JG, Jordan MI (2011) Dimensionality reduction for spectral clustering. In: International conference on artificial intelligence and statistics, pp 552–560
25.
go back to reference Pavan M, Pelillo M (2003) A graph-theoretic approach to clustering and segmentation. In: IEEE international conference on computer vision and pattern recognition, pp 145–152 Pavan M, Pelillo M (2003) A graph-theoretic approach to clustering and segmentation. In: IEEE international conference on computer vision and pattern recognition, pp 145–152
26.
go back to reference Pavan M, Pelillo M (2007) Dominant sets and pairwise clustering. IEEE Trans Pattern Anal Mach Intell 29(1):167–172CrossRef Pavan M, Pelillo M (2007) Dominant sets and pairwise clustering. IEEE Trans Pattern Anal Mach Intell 29(1):167–172CrossRef
27.
go back to reference Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344:1492–1496CrossRef Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344:1492–1496CrossRef
28.
go back to reference Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):167–172 Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):167–172
29.
go back to reference Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a data set via the gap statistic. J R Stat Soc Ser B Stat Methodol 63(2):411–423MathSciNetCrossRefMATH Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a data set via the gap statistic. J R Stat Soc Ser B Stat Methodol 63(2):411–423MathSciNetCrossRefMATH
30.
go back to reference Torsello A, Bulo SR, Pelillo M (2008) Beyond partitions: allowing overlapping groups in pairwise clustering. In: International conference on pattern recognition, pp 1–4 Torsello A, Bulo SR, Pelillo M (2008) Beyond partitions: allowing overlapping groups in pairwise clustering. In: International conference on pattern recognition, pp 1–4
31.
go back to reference Veenman CJ, Reinders M, Backer E (2002) A maximum variance cluster algorithm. IEEE Trans Pattern Anal Mach Intell 24(9):1273–1280CrossRef Veenman CJ, Reinders M, Backer E (2002) A maximum variance cluster algorithm. IEEE Trans Pattern Anal Mach Intell 24(9):1273–1280CrossRef
32.
go back to reference Yan D, Huang L, Jordan MI (2009) Fast approximate spectral clustering. In: International conference on knowledge discovery and data mining, pp 907–916 Yan D, Huang L, Jordan MI (2009) Fast approximate spectral clustering. In: International conference on knowledge discovery and data mining, pp 907–916
33.
go back to reference Yang XW, Liu HR, Laecki LJ (2012) Contour-based object detection as dominant set computation. Pattern Recognit 45:1927–1936CrossRef Yang XW, Liu HR, Laecki LJ (2012) Contour-based object detection as dominant set computation. Pattern Recognit 45:1927–1936CrossRef
34.
go back to reference Zhu X, Loy CC, Gong S (2014) Constructing robust affinity graphs for spectral clustering. In: IEEE international conference on computer vision and pattern recognition, pp 1450–1457 Zhu X, Loy CC, Gong S (2014) Constructing robust affinity graphs for spectral clustering. In: IEEE international conference on computer vision and pattern recognition, pp 1450–1457
Metadata
Title
Cluster merging based on a decision threshold
Authors
Jian Hou
Boping Zhang
Publication date
14-11-2016
Publisher
Springer London
Published in
Neural Computing and Applications / Issue 1/2018
Print ISSN: 0941-0643
Electronic ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-016-2699-4

Other articles of this Issue 1/2018

Neural Computing and Applications 1/2018 Go to the issue

Premium Partner