nach oben

Soft Computing

Erschienen in:

01.05.2014 | Methodologies and Application

GACH: a grid-based algorithm for hierarchical clustering of high-dimensional data

verfasst von: Eghbal G. Mansoori

Erschienen in: Soft Computing | Ausgabe 5/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

This paper proposes a grid-based hierarchical clustering algorithm (GACH) as an efficient and robust method to explore clusters in high-dimensional data with no prior knowledge. It discovers the initial positions of the potential clusters automatically and then combines them hierarchically to obtain the final clusters. In this regard, GACH first projects the data patterns on a two-dimensional space (i.e., on a plane established by two features) to overcome the curse of dimensionality problem in high-dimensional data. To choose these two well-informed features, a simple and fast feature selection algorithm is proposed. Then, through meshing the plane with grid lines, GACH detects the crowded grid points. The nearest data patterns around these grid points are considered as initial members of some potential clusters. By returning the patterns back to their true dimensions, GACH refines these clusters. In the merging phase, GACH combines the closely adjacent clusters in a hierarchical bottom-up manner to construct the final clusters’ members. The main features of GACH are: (1) it automatically discovers the clusters, (2) the obtained clusters are stable, (3) it is efficient for data sets with high dimensions, and (4) its merging process involves a threshold which can be obtained in advance for well-clustered data. To assess our proposed algorithm, it is applied on some benchmark data sets and the validity of obtained clusters is compared with the results of some other clustering algorithms. This comparison shows that GACH is accurate, efficient and feasible to discover clusters in high-dimensional data.

Vorheriger Artikel Benchmarking and comparison of nature-inspired population-based continuous optimisation algorithms

Nächster Artikel A PROMETHEE-based outranking method for multiple criteria decision analysis with interval type-2 fuzzy sets

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic sub-space clustering of high dimensional data for data mining applications. In: Proceedings of ACM SIGMOD International Conference on MOD, pp 94–105

Asuncion A, Newman DJ (2007) UCI machine learning repository. Department of Information and Computer Science, University of California, Irvine

Benson SYL, Hong Y (2007) Assessment of microarray data clustering results based on a new geometrical index for cluster validity. Soft Comput 11(4):341–348

Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Plenum Press, New York

Chandra B, Gupta M (2013) A novel approach for distance-based semi-supervised clustering using functional link neural network. Soft Comput 17(3):369–379

Chang CI, Lin NP, Jan NY (2009) An axis shifted clustering algorithm. Tamkang J Sci Eng 12(2):183–192

Everitt B, Landau S, Leese M (2001) Cluster analysis. Arnold, London

Han J, Kamber M (2001) Data mining: concepts and techniques. Morgan Kaufmann, San Francisco

Hinneburg A, Keim D (1999) Optimal grid-clustering: toward breaking the curse of dimensionality in high-dimensional clustering. In: Proceedings of the 25th VLDB Conference, pp 506–517

Ilango M, Mohan V (2010) A survey of grid based clustering algorithms. Int J Eng Sci Tech 2(8):3441–3446

Jain A, Murty M, Flynn P (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323CrossRef

Jarman IH, Etchells TA, Bacciu D, Garibaldi JM, Ellis IO, Lisboa PJG (2011) Clustering of protein expression data: a benchmark of statistical and neural approaches. Soft Comput 15(8):1459–1469CrossRef

Kohavi R, Provost F (1998) Glossary of terms. Editorial for the Special Issue on Applications of Machine Learning and the Knowledge Discovery Process 30(2/3)

Krishnapuram R, Keller JM (1993) A possibilistic approach to clustering. IEEE Trans Fuzzy Syst 1(2):98–110CrossRef

Mansoori EG (2011) FRBC: a fuzzy rule-based clustering algorithm. IEEE Trans Fuzzy Syst 19(5):960–971CrossRef

Maulik U, Bandyopadhyay S (2002) Performance evaluation of some clustering algorithms and validity indices. IEEE Trans Pattern Anal Mach Intell 24(12):1650–1654CrossRef

McQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of Fifth Berkeley Symposium, Math Statistics and Probability, pp 281–297

Monmarché N, Slimane M, Venturini G (1999) AntClass: discovery of clusters in numeric data by an hybridization of an ant colony with the Kmeans algorithm. Internal Report No 213, E3i

Ng MK, Li MJ, Huang JZ, He Z (2007) On the impact of dissimilarity measure in \(k\)-modes clustering algorithm. IEEE Trans Pattern Anal Mach Intell 29(3):503–507CrossRef

Ordonez C, Omiecinski E (2004) Efficient disk-based K-means clustering for relational databases. IEEE Trans Knowl Data Eng 16(8):909–921CrossRef

Schikuta E (1993) Grid-clustering: a hierarchical clustering method for very large data sets. In: Technical Report TR-CRPC No. 93358, Center for Research on Parallel Computation, Rice University, Houston

Sheikholeslami G, Chatterjee S, Zhang A (2000) WaveCluster: a wavelet-based clustering approach for spatial data in very large databases. VLDB J Very Large Data Bases 8:289–304CrossRef

Schockaert S, De-Cock M, Cornelis C, Kerr EE (2007) Clustering web search results using fuzzy ants. Int J Intell Syst 22(5):455–474CrossRefMATH

Sledge IJ, Havens TC, Huband JM, Bezdek JC, Keller JM (2009) Finding the number of clusters in ordered dissimilarities. Soft Comput 13(12):1125–1142CrossRef

Vicente D, Vellido A (2004) A review of hierarchical models for data clustering and visualization. In: Gir’aldez R, Riquelme JC, Aguilar-Ruiz JS (eds) Tendencias de la Minería de Datos en España. Red Española de Minería de Datos

Xu R, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678CrossRef

Yager R, Filev D (1994) Approximate clustering via the mountain method. IEEE Trans Syst Man Cybern 24(8):1279–1284CrossRef

Yang W, Muntz R, Wang W, Yang J (1997) STING: a statistical information grid approach to spatial data mining. In: Proceedings of 23rd International Conference on VLDB, pp 186–195

Yue S, Wei M, Wang J, Wang H (2008) A general grid-clustering approach. Pattern Recognit Lett 29:1372–1384CrossRef

Titel: GACH: a grid-based algorithm for hierarchical clustering of high-dimensional data
verfasst von: Eghbal G. Mansoori
Publikationsdatum: 01.05.2014
Verlag: Springer Berlin Heidelberg
Erschienen in: Soft Computing / Ausgabe 5/2014
Print ISSN: 1432-7643
Elektronische ISSN: 1433-7479
DOI: https://doi.org/10.1007/s00500-013-1105-8

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 5/2014

Benchmarking and comparison of nature-inspired population-based continuous optimisation algorithms

Fuzzy random multi-objective optimization based routing for wireless sensor networks

$$\lambda $$ λ -Statistical convergence of fuzzy numbers and fuzzy functions of order $$\theta $$ θ

Neural network learning from demonstration and epipolar geometry for visual control of a nonholonomic mobile robot

A PROMETHEE-based outranking method for multiple criteria decision analysis with interval type-2 fuzzy sets

An interval programming approach for the bilevel linear programming problem under fuzzy random environments