nach oben

Soft Computing

Erschienen in:

13.10.2016 | Foundations

Pseudo-centroid clustering

verfasst von: Fred Glover

Erschienen in: Soft Computing | Ausgabe 22/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Pseudo-centroid clustering replaces the traditional concept of a centroid expressed as a center of gravity with the notion of a pseudo-centroid (or a coordinate free centroid) which has the advantage of applying to clustering problems where points do not have numerical coordinates (or categorical coordinates that are translated into numerical form). Such problems, for which classical centroids do not exist, are particularly important in social sciences, marketing, psychology and economics, where distances are not computed from vector coordinates but rather are expressed in terms of characteristics such as affinity relationships, psychological preferences, advertising responses, polling data and market interactions, where distances, broadly conceived, measure the similarity (or dissimilarity) of characteristics, functions or structures. We formulate a K-PC algorithm analogous to a K-Means algorithm and focus on two key types of pseudo-centroids, MinMax-centroids and (weighted) MinSum-centroids, and describe how they, respectively, give rise to a K-MinMax algorithm and a K-MinSum algorithm which are analogous to a K-Means algorithm. The K-PC algorithms are able to take advantage of problem structure to identify special diversity-based and intensity-based starting methods to generate initial pseudo-centroids and associated clusters, accompanied by theorems for the intensity-based methods that establish their ability to obtain best clusters of a selected size from the points available at each stage of construction. We also introduce a regret-threshold PC algorithm that modifies the K-PC algorithm together with an associated diversification method and a new criterion for evaluating the quality of a collection of clusters.

Vorheriger Artikel -algebras

Nächster Artikel Cooperative particle swarm optimization using MapReduce

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Nur mit Berechtigung zugänglich

Communication from Cao, regarding applications of the method of Cao et al. (2015) to commercial applications.

The “intensity-based” terminology does not relate to “intensification methods” as used in metaheuristic algorithms. However, the “diversity-based” terminology carries an association with “diversification methods” used in such algorithms.

An interesting small value of k’ to initiate such an approach is 3.

This permits starting with kFirst = 2 and later increasing kFirst to kCheck + 1 accompanied by resetting kCheck = k for the final execution.

An exception occurs when \(k_{{o}} = 1\), at which point all of \(N_{{o}}\) is absorbed in any case.

A variation to accentuate the influence of points with larger \(D_{{o}}(h)\) values is to define Mean\(_{{o}}(h) = D_{{o}}(h)^{2}/{\vert }C(h){\vert }\).

The K-MinProduct problem can also be approached via the K-MinSum algorithm by using logarithms. The K-MinMedian problem is not to be confused with the K-Median problem, which is based on a different concept relying on coordinate vectors. We can also define a K-MedianMedian algorithm, among other novel variants.

See, for example, Glover (1994, 1997).

Anderberg MR (1973) Cluster analysis for applications. Academic Press, LondonMATH

Anwar TM, Beck HW, Navathe SB (2012) Knowledge mining by imprecise querying: a classification-based approach. In: Proceedings of the eighth international conference on data engineering, IEEE computer society, pp 622–630. Washington, DC

Cao B, Glover F (2010) Creating balanced and connected clusters to improve service delivery routes in logistics planning. J Syst Sci Syst Eng 19(4):453–480CrossRef

Cao B, Glover F, Rego C (2015) A tabu search algorithm for cohesive clustering problems. J Heuristics 21:457–477CrossRef

Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 1:224–227CrossRef

Estivill-Castro V, Lee I (2001) Fast spatial clustering with different metrics and in the presence of obstacles. GIS’01 142–147

Fan B (2009) A hybrid spatial data clustering method for site selection: the data driven approach of GIS mining. Exp Syst Appl 36:3923–3936CrossRef

Glover F (1994) Tabu search for nonlinear and parametric optimization (with links to genetic algorithms). Discrete Appl Math 49:231–255MathSciNetCrossRefMATH

Glover F (1997) A template for scatter search and path relinking. In: J-K Hao, E Lutton, E Ronald, M Schoenauer, D Snyers (eds) Artificial evolution, lecture notes in computer science. 1363, pp 13–54. Springer, Berlin

Huang Z (1998) Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min Knowl Discov 2:283–304CrossRef

Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall, Englewood Cliffs, NJMATH

Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20:359–392MathSciNetCrossRefMATH

Kauffmann L, Rousseeuw PJ (1990) Finding groups in data—an introduction to cluster analysis. Wiley, London

Kochenberger G, Glover F, Alidaee B, Wang H (2005) Clustering of microarray data via clique partitioning. J Comb Optim 10:77–92

Kwon Y-J, Kim JG, Seo J, Lee DH, Kim DS (2007) A tabu search algorithm using voronoi diagram for the capacitated vehicle routing problem. In: Proceeding of 5th international conference on computational science and applications. IEEE computer society, pp 480–485

MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. Proceedings of 5th Berkeley symposium on mathematical statistics and probability. University of California Press, Berkeley, pp 281–297

Ng R, Han J (2002) CLARANS: a method for custering objects for satial data mining. IEEE Trans Knowl Data Eng 14(5):1003–1016CrossRef

Paivinen N (2005) Clustering with a minimum spanning tree of scale-free-like structure. Pattern Recognit Lett 26(7):921–930CrossRef

Park H-S, Jun C-H (2009) A simple and fast algorithm for K-medoids clustering. Expert Syst Appl 36:3336–3341CrossRef

Ralambondrainy H (1995) A conceptual version of the k-means algorithm. Pattern Recognit Lett 16:1147–1157CrossRef

Shamsul A, Inostroza-Ponta M, Mathieson L, Berretta R, Moscato P (2011) Clustering nodes in large-scale biological networks using external memory algorithms. In: Xiang et al (ed) ICA3PP 2011 workshops, part II, LNCS 7017, pp 375–386

Strehl A, Ghosh J (2002) Relationship-based clustering and visualization for high-dimensional data mining. INFORMS J Comput 1–23

Sudha KR, Raju YB, Sekhar AC (2012) Fuzzy C-means clustering for robust decentralized load frequency control of interconnected power system with generation rate constraint. Electr Power Energy Syst 37:58–66CrossRef

Xu Y, Olman V, Xu D (2001) Minimum spanning trees for gene expression data clustering. Genome Inf 12:24–33

Titel: Pseudo-centroid clustering
verfasst von: Fred Glover
Publikationsdatum: 13.10.2016
Verlag: Springer Berlin Heidelberg
Erschienen in: Soft Computing / Ausgabe 22/2017
Print ISSN: 1432-7643
Elektronische ISSN: 1433-7479
DOI: https://doi.org/10.1007/s00500-016-2369-6

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 22/2017

An uncertain currency model with floating interest rates

A comparative study on swarm intelligence for structure learning of Bayesian networks

Bidirectional reservoir networks trained using SVM privileged information for manufacturing process modeling

Segmentation and enhancement of brain MR images using fuzzy clustering based on information theory

Bearing fault identification of three-phase induction motors bases on two current sensor strategy

A zSlices-based general type-2 fuzzy logic system for users-centric adaptive learning in large-scale e-learning platforms