Skip to main content
Erschienen in: Soft Computing 6/2011

01.06.2011 | Focus

CHSMST: a clustering algorithm based on hyper surface and minimum spanning tree

verfasst von: Qing He, Weizhong Zhao, Zhongzhi Shi

Erschienen in: Soft Computing | Ausgabe 6/2011

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

As data mining having attracted a significant amount of research attention, many clustering algorithms have been proposed in the past decades. However, most of existing clustering methods have high computational time or are not suitable for discovering clusters with non-convex shape. In this paper, an efficient clustering algorithm CHSMST is proposed, which is based on clustering based on hyper surface (CHS) and minimum spanning tree. In the first step, CHSMST applies CHS to obtain initial clusters immediately. Thereafter, minimum spanning tree is introduced to handle locally dense data which is hard for CHS to deal with. The experiments show that CHSMST can discover clusters with arbitrary shape. Moreover, CHSMST is insensitive to the order of input samples and the run time of the algorithm increases moderately as the scale of dataset becomes large.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
1
All the experimental results are given under the computing environment as follows. (1) Main computer Processor: Intel (R) Core 2 Duo CPU @ 2.33 GHz Memory: 2 GB; (2) Operating system Microsoft Windows XP Professional, Service Pack 2.
 
Literatur
Zurück zum Zitat Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the ACM SIGMOD international conference on management of data. Seattle, Washington, DC, USA, pp 94–105 Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the ACM SIGMOD international conference on management of data. Seattle, Washington, DC, USA, pp 94–105
Zurück zum Zitat Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Plenum, New YorkMATH Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Plenum, New YorkMATH
Zurück zum Zitat Ester M, Kriegel HP, Sander J, Xu XW (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd international conference on knowledge discovery and data mining, pp 226–231 Ester M, Kriegel HP, Sander J, Xu XW (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd international conference on knowledge discovery and data mining, pp 226–231
Zurück zum Zitat Everitt B, Landau S, Leese M (2001) Cluster analysis. Arnold, LondonMATH Everitt B, Landau S, Leese M (2001) Cluster analysis. Arnold, LondonMATH
Zurück zum Zitat Gao XB, Xie WX (2000) Advances in theory and applications of fuzzy clustering. Chin Sci Bull 45(11):961–970CrossRefMATH Gao XB, Xie WX (2000) Advances in theory and applications of fuzzy clustering. Chin Sci Bull 45(11):961–970CrossRefMATH
Zurück zum Zitat Gao XB, Ji H, Li J (2002) An advanced cluster analysis method based on statistical test. In: Proceedings of IEEE international conference on signal processing, pp 1100–1103 Gao XB, Ji H, Li J (2002) An advanced cluster analysis method based on statistical test. In: Proceedings of IEEE international conference on signal processing, pp 1100–1103
Zurück zum Zitat Gao XB, Li J, Tao DC, Li XL (2007) Fuzziness measurement of fuzzy sets and its application in cluster validity analysis. Int J Fuzzy Syst 9(4):188–197MathSciNet Gao XB, Li J, Tao DC, Li XL (2007) Fuzziness measurement of fuzzy sets and its application in cluster validity analysis. Int J Fuzzy Syst 9(4):188–197MathSciNet
Zurück zum Zitat Guha S, Rastogi R, Shim K (1998) CURE: an efficient clustering algorithm for large databases. In: Proceedings of ACM SIGMOD international conference on management of data, pp 73–84 Guha S, Rastogi R, Shim K (1998) CURE: an efficient clustering algorithm for large databases. In: Proceedings of ACM SIGMOD international conference on management of data, pp 73–84
Zurück zum Zitat Han JW, Kamber M (2001) Data mining: concepts and techniques. Morgan Kaufmann Publishers, San Francisco Han JW, Kamber M (2001) Data mining: concepts and techniques. Morgan Kaufmann Publishers, San Francisco
Zurück zum Zitat Hansen P, Jaumard B (1997) Cluster analysis and mathematical programming. Math Program 79:191–215MathSciNetMATH Hansen P, Jaumard B (1997) Cluster analysis and mathematical programming. Math Program 79:191–215MathSciNetMATH
Zurück zum Zitat Hathaway R, Bezdek J (2001) Fuzzy c-means clustering of incomplete data. IEEE Trans Syst Man Cybern, Part B 31(5):735–744CrossRef Hathaway R, Bezdek J (2001) Fuzzy c-means clustering of incomplete data. IEEE Trans Syst Man Cybern, Part B 31(5):735–744CrossRef
Zurück zum Zitat He Q, Shi ZZ, Ren LA (2002) The classification method based on hyper surface. In: Proceedings of the 2002 international joint conference on neural networks, pp 1499–1503 He Q, Shi ZZ, Ren LA (2002) The classification method based on hyper surface. In: Proceedings of the 2002 international joint conference on neural networks, pp 1499–1503
Zurück zum Zitat He Q, Shi ZZ, Ren LA, Lee ES (2003) A novel classification method based on hypersurface. Int J Math Comput Model 38:395–407MathSciNetCrossRefMATH He Q, Shi ZZ, Ren LA, Lee ES (2003) A novel classification method based on hypersurface. Int J Math Comput Model 38:395–407MathSciNetCrossRefMATH
Zurück zum Zitat He Q, Zhao XR, Shi ZZ (2008) Minimal consistent subset for hyper surface classification method. Int J Pattern Recognit Artif Intell 22(1):95–108CrossRef He Q, Zhao XR, Shi ZZ (2008) Minimal consistent subset for hyper surface classification method. Int J Pattern Recognit Artif Intell 22(1):95–108CrossRef
Zurück zum Zitat Hung M, Yang D (2001) An efficient fuzzy c-means clustering algorithm. In: Proceedings of IEEE international conference on data mining, pp 225–232 Hung M, Yang D (2001) An efficient fuzzy c-means clustering algorithm. In: Proceedings of IEEE international conference on data mining, pp 225–232
Zurück zum Zitat Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall, Englewood CliffsMATH Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall, Englewood CliffsMATH
Zurück zum Zitat Jain AK, Duin RPW, Mao J (2000) Statistical pattern recognition: a review. IEEE Trans Pattern Anal Mach Intell 22(1):4–36CrossRef Jain AK, Duin RPW, Mao J (2000) Statistical pattern recognition: a review. IEEE Trans Pattern Anal Mach Intell 22(1):4–36CrossRef
Zurück zum Zitat Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323CrossRef Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323CrossRef
Zurück zum Zitat Karypis G, Han E, Kumar V (1999) Chameleon: hierarchical clustering using dynamic modeling. IEEE Comput 32(8):68–75 Karypis G, Han E, Kumar V (1999) Chameleon: hierarchical clustering using dynamic modeling. IEEE Comput 32(8):68–75
Zurück zum Zitat Kolen J, Hutcheson T (2002) Reducing the time complexity of the fuzzy c-means algorithm. IEEE Trans Fuzzy Syst 10(2):263–267CrossRef Kolen J, Hutcheson T (2002) Reducing the time complexity of the fuzzy c-means algorithm. IEEE Trans Fuzzy Syst 10(2):263–267CrossRef
Zurück zum Zitat Kruskal JB (1956) On the shortest spanning subtree of a graph and the traveling salesman problem. Proc Am Math Soc 7:48–50 Kruskal JB (1956) On the shortest spanning subtree of a graph and the traveling salesman problem. Proc Am Math Soc 7:48–50
Zurück zum Zitat McQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley symposium on mathematical statistics and probability, vol 1, pp 281–287 McQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley symposium on mathematical statistics and probability, vol 1, pp 281–287
Zurück zum Zitat Ng RT, Han JW (1994) Efficient and effective clustering methods for spatial data mining. In: Proceedings of the 20th VLDB conference, pp 144–155 Ng RT, Han JW (1994) Efficient and effective clustering methods for spatial data mining. In: Proceedings of the 20th VLDB conference, pp 144–155
Zurück zum Zitat Prim RC (1957) Shortest connection networks and some generalizations. Bell Syst Tech J 36(1):1389–1401 Prim RC (1957) Shortest connection networks and some generalizations. Bell Syst Tech J 36(1):1389–1401
Zurück zum Zitat Raraldi A, Blonda P (1999) A survey of fuzzy clustering algorithms for pattern recognition, Part I and II. IEEE Trans Syst Man Cybern, Part B 29(6):778–801CrossRef Raraldi A, Blonda P (1999) A survey of fuzzy clustering algorithms for pattern recognition, Part I and II. IEEE Trans Syst Man Cybern, Part B 29(6):778–801CrossRef
Zurück zum Zitat Stonebraker M, Frew J, Gardels K, Meredith J (1993) The SEQUOIA 2000 storage benchmark. In: Proceedings of ACM SIGMOD international conference on management of data, pp 2–11 Stonebraker M, Frew J, Gardels K, Meredith J (1993) The SEQUOIA 2000 storage benchmark. In: Proceedings of ACM SIGMOD international conference on management of data, pp 2–11
Zurück zum Zitat Wu KL, Yu J, Yang MS (2005) A novel fuzzy clustering algorithm based on a fuzzy scatter matrix with optimality tests. Pattern Recognit Lett 26:639–652CrossRef Wu KL, Yu J, Yang MS (2005) A novel fuzzy clustering algorithm based on a fuzzy scatter matrix with optimality tests. Pattern Recognit Lett 26:639–652CrossRef
Zurück zum Zitat Xu R, Wunsch II D (2005) Survey of clustering algorithms. IEEE Trans Neural Net 16(3):645–678CrossRef Xu R, Wunsch II D (2005) Survey of clustering algorithms. IEEE Trans Neural Net 16(3):645–678CrossRef
Zurück zum Zitat Yu J (2005) General c-means clustering model. IEEE Trans Pattern Anal Mach Intell 27(8):1197–1211CrossRef Yu J (2005) General c-means clustering model. IEEE Trans Pattern Anal Mach Intell 27(8):1197–1211CrossRef
Zurück zum Zitat Yu J, Cheng QS, Huang HK (2004) Analysis of the weighting exponent in the FCM. IEEE Trans Syst Man Cybern, Part B: Cybern 34(1):634–639CrossRef Yu J, Cheng QS, Huang HK (2004) Analysis of the weighting exponent in the FCM. IEEE Trans Syst Man Cybern, Part B: Cybern 34(1):634–639CrossRef
Zurück zum Zitat Yu J, Yang MS (2005) Optimality test for generalized FCM and its application on parameter selection. IEEE Trans Fuzzy Syst 13(1):164–176CrossRef Yu J, Yang MS (2005) Optimality test for generalized FCM and its application on parameter selection. IEEE Trans Fuzzy Syst 13(1):164–176CrossRef
Metadaten
Titel
CHSMST: a clustering algorithm based on hyper surface and minimum spanning tree
verfasst von
Qing He
Weizhong Zhao
Zhongzhi Shi
Publikationsdatum
01.06.2011
Verlag
Springer-Verlag
Erschienen in
Soft Computing / Ausgabe 6/2011
Print ISSN: 1432-7643
Elektronische ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-010-0585-z

Weitere Artikel der Ausgabe 6/2011

Soft Computing 6/2011 Zur Ausgabe