Skip to main content
Erschienen in: Soft Computing 22/2017

13.10.2016 | Foundations

Pseudo-centroid clustering

verfasst von: Fred Glover

Erschienen in: Soft Computing | Ausgabe 22/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Pseudo-centroid clustering replaces the traditional concept of a centroid expressed as a center of gravity with the notion of a pseudo-centroid (or a coordinate free centroid) which has the advantage of applying to clustering problems where points do not have numerical coordinates (or categorical coordinates that are translated into numerical form). Such problems, for which classical centroids do not exist, are particularly important in social sciences, marketing, psychology and economics, where distances are not computed from vector coordinates but rather are expressed in terms of characteristics such as affinity relationships, psychological preferences, advertising responses, polling data and market interactions, where distances, broadly conceived, measure the similarity (or dissimilarity) of characteristics, functions or structures. We formulate a K-PC algorithm analogous to a K-Means algorithm and focus on two key types of pseudo-centroids, MinMax-centroids and (weighted) MinSum-centroids, and describe how they, respectively, give rise to a K-MinMax algorithm and a K-MinSum algorithm which are analogous to a K-Means algorithm. The K-PC algorithms are able to take advantage of problem structure to identify special diversity-based and intensity-based starting methods to generate initial pseudo-centroids and associated clusters, accompanied by theorems for the intensity-based methods that establish their ability to obtain best clusters of a selected size from the points available at each stage of construction. We also introduce a regret-threshold PC algorithm that modifies the K-PC algorithm together with an associated diversification method and a new criterion for evaluating the quality of a collection of clusters.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
Communication from Cao, regarding applications of the method of Cao et al. (2015) to commercial applications.
 
2
The “intensity-based” terminology does not relate to “intensification methods” as used in metaheuristic algorithms. However, the “diversity-based” terminology carries an association with “diversification methods” used in such algorithms.
 
3
An interesting small value of k’ to initiate such an approach is 3.
 
4
This permits starting with kFirst = 2 and later increasing kFirst to kCheck + 1 accompanied by resetting kCheck = k for the final execution.
 
5
An exception occurs when \(k_{{o}} = 1\), at which point all of \(N_{{o}}\) is absorbed in any case.
 
6
A variation to accentuate the influence of points with larger \(D_{{o}}(h)\) values is to define Mean\(_{{o}}(h) = D_{{o}}(h)^{2}/{\vert }C(h){\vert }\).
 
7
The K-MinProduct problem can also be approached via the K-MinSum algorithm by using logarithms. The K-MinMedian problem is not to be confused with the K-Median problem, which is based on a different concept relying on coordinate vectors. We can also define a K-MedianMedian algorithm, among other novel variants.
 
8
See, for example, Glover (1994, 1997).
 
Literatur
Zurück zum Zitat Anderberg MR (1973) Cluster analysis for applications. Academic Press, LondonMATH Anderberg MR (1973) Cluster analysis for applications. Academic Press, LondonMATH
Zurück zum Zitat Anwar TM, Beck HW, Navathe SB (2012) Knowledge mining by imprecise querying: a classification-based approach. In: Proceedings of the eighth international conference on data engineering, IEEE computer society, pp 622–630. Washington, DC Anwar TM, Beck HW, Navathe SB (2012) Knowledge mining by imprecise querying: a classification-based approach. In: Proceedings of the eighth international conference on data engineering, IEEE computer society, pp 622–630. Washington, DC
Zurück zum Zitat Cao B, Glover F (2010) Creating balanced and connected clusters to improve service delivery routes in logistics planning. J Syst Sci Syst Eng 19(4):453–480CrossRef Cao B, Glover F (2010) Creating balanced and connected clusters to improve service delivery routes in logistics planning. J Syst Sci Syst Eng 19(4):453–480CrossRef
Zurück zum Zitat Cao B, Glover F, Rego C (2015) A tabu search algorithm for cohesive clustering problems. J Heuristics 21:457–477CrossRef Cao B, Glover F, Rego C (2015) A tabu search algorithm for cohesive clustering problems. J Heuristics 21:457–477CrossRef
Zurück zum Zitat Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 1:224–227CrossRef Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 1:224–227CrossRef
Zurück zum Zitat Estivill-Castro V, Lee I (2001) Fast spatial clustering with different metrics and in the presence of obstacles. GIS’01 142–147 Estivill-Castro V, Lee I (2001) Fast spatial clustering with different metrics and in the presence of obstacles. GIS’01 142–147
Zurück zum Zitat Fan B (2009) A hybrid spatial data clustering method for site selection: the data driven approach of GIS mining. Exp Syst Appl 36:3923–3936CrossRef Fan B (2009) A hybrid spatial data clustering method for site selection: the data driven approach of GIS mining. Exp Syst Appl 36:3923–3936CrossRef
Zurück zum Zitat Glover F (1994) Tabu search for nonlinear and parametric optimization (with links to genetic algorithms). Discrete Appl Math 49:231–255MathSciNetCrossRefMATH Glover F (1994) Tabu search for nonlinear and parametric optimization (with links to genetic algorithms). Discrete Appl Math 49:231–255MathSciNetCrossRefMATH
Zurück zum Zitat Glover F (1997) A template for scatter search and path relinking. In: J-K Hao, E Lutton, E Ronald, M Schoenauer, D Snyers (eds) Artificial evolution, lecture notes in computer science. 1363, pp 13–54. Springer, Berlin Glover F (1997) A template for scatter search and path relinking. In: J-K Hao, E Lutton, E Ronald, M Schoenauer, D Snyers (eds) Artificial evolution, lecture notes in computer science. 1363, pp 13–54. Springer, Berlin
Zurück zum Zitat Huang Z (1998) Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min Knowl Discov 2:283–304CrossRef Huang Z (1998) Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min Knowl Discov 2:283–304CrossRef
Zurück zum Zitat Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall, Englewood Cliffs, NJMATH Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall, Englewood Cliffs, NJMATH
Zurück zum Zitat Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20:359–392MathSciNetCrossRefMATH Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20:359–392MathSciNetCrossRefMATH
Zurück zum Zitat Kauffmann L, Rousseeuw PJ (1990) Finding groups in data—an introduction to cluster analysis. Wiley, London Kauffmann L, Rousseeuw PJ (1990) Finding groups in data—an introduction to cluster analysis. Wiley, London
Zurück zum Zitat Kochenberger G, Glover F, Alidaee B, Wang H (2005) Clustering of microarray data via clique partitioning. J Comb Optim 10:77–92 Kochenberger G, Glover F, Alidaee B, Wang H (2005) Clustering of microarray data via clique partitioning. J Comb Optim 10:77–92
Zurück zum Zitat Kwon Y-J, Kim JG, Seo J, Lee DH, Kim DS (2007) A tabu search algorithm using voronoi diagram for the capacitated vehicle routing problem. In: Proceeding of 5th international conference on computational science and applications. IEEE computer society, pp 480–485 Kwon Y-J, Kim JG, Seo J, Lee DH, Kim DS (2007) A tabu search algorithm using voronoi diagram for the capacitated vehicle routing problem. In: Proceeding of 5th international conference on computational science and applications. IEEE computer society, pp 480–485
Zurück zum Zitat MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. Proceedings of 5th Berkeley symposium on mathematical statistics and probability. University of California Press, Berkeley, pp 281–297 MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. Proceedings of 5th Berkeley symposium on mathematical statistics and probability. University of California Press, Berkeley, pp 281–297
Zurück zum Zitat Ng R, Han J (2002) CLARANS: a method for custering objects for satial data mining. IEEE Trans Knowl Data Eng 14(5):1003–1016CrossRef Ng R, Han J (2002) CLARANS: a method for custering objects for satial data mining. IEEE Trans Knowl Data Eng 14(5):1003–1016CrossRef
Zurück zum Zitat Paivinen N (2005) Clustering with a minimum spanning tree of scale-free-like structure. Pattern Recognit Lett 26(7):921–930CrossRef Paivinen N (2005) Clustering with a minimum spanning tree of scale-free-like structure. Pattern Recognit Lett 26(7):921–930CrossRef
Zurück zum Zitat Park H-S, Jun C-H (2009) A simple and fast algorithm for K-medoids clustering. Expert Syst Appl 36:3336–3341CrossRef Park H-S, Jun C-H (2009) A simple and fast algorithm for K-medoids clustering. Expert Syst Appl 36:3336–3341CrossRef
Zurück zum Zitat Ralambondrainy H (1995) A conceptual version of the k-means algorithm. Pattern Recognit Lett 16:1147–1157CrossRef Ralambondrainy H (1995) A conceptual version of the k-means algorithm. Pattern Recognit Lett 16:1147–1157CrossRef
Zurück zum Zitat Shamsul A, Inostroza-Ponta M, Mathieson L, Berretta R, Moscato P (2011) Clustering nodes in large-scale biological networks using external memory algorithms. In: Xiang et al (ed) ICA3PP 2011 workshops, part II, LNCS 7017, pp 375–386 Shamsul A, Inostroza-Ponta M, Mathieson L, Berretta R, Moscato P (2011) Clustering nodes in large-scale biological networks using external memory algorithms. In: Xiang et al (ed) ICA3PP 2011 workshops, part II, LNCS 7017, pp 375–386
Zurück zum Zitat Strehl A, Ghosh J (2002) Relationship-based clustering and visualization for high-dimensional data mining. INFORMS J Comput 1–23 Strehl A, Ghosh J (2002) Relationship-based clustering and visualization for high-dimensional data mining. INFORMS J Comput 1–23
Zurück zum Zitat Sudha KR, Raju YB, Sekhar AC (2012) Fuzzy C-means clustering for robust decentralized load frequency control of interconnected power system with generation rate constraint. Electr Power Energy Syst 37:58–66CrossRef Sudha KR, Raju YB, Sekhar AC (2012) Fuzzy C-means clustering for robust decentralized load frequency control of interconnected power system with generation rate constraint. Electr Power Energy Syst 37:58–66CrossRef
Zurück zum Zitat Xu Y, Olman V, Xu D (2001) Minimum spanning trees for gene expression data clustering. Genome Inf 12:24–33 Xu Y, Olman V, Xu D (2001) Minimum spanning trees for gene expression data clustering. Genome Inf 12:24–33
Metadaten
Titel
Pseudo-centroid clustering
verfasst von
Fred Glover
Publikationsdatum
13.10.2016
Verlag
Springer Berlin Heidelberg
Erschienen in
Soft Computing / Ausgabe 22/2017
Print ISSN: 1432-7643
Elektronische ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-016-2369-6

Weitere Artikel der Ausgabe 22/2017

Soft Computing 22/2017 Zur Ausgabe