Top

Published in:

2011 | OriginalPaper | Chapter

6. K-Means and Related Clustering Methods

Author : Boris Mirkin

Published in: Core Concepts in Data Analysis: Summarization, Correlation and Visualization

Publisher: Springer London

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

K-Means is arguably the most popular data analysis method. The method outputs a partition of the entity set into clusters and centroids representing them. It is very intuitive and usually requires just a few pages to get presented. This text includes a number of less popular subjects that are important when using K-Means for real-world data analysis: Data standardization, especially, at mixed scales Innate tools for interpretation of clusters Analysis of examples of K-Means working and its failures Initialization – the choice of the number of clusters and location of centroids sVersions of K-Means such as incremental K-Means, nature inspired K-Means, and entity-centroid “medoid” methods are presented. Three modifications of K-Means onto different cluster structures are given:. Fuzzy K-Means for finding fuzzy clusters, Expectation-Maximization (EM) for finding probabilistic clusters, and Kohonen self-organizing maps (SOM) that tie up the sought clusters to a visually convenient two-dimensional grid. Equivalent reformulations of K-Means criterion are described – they can yield different algorithms for K-Means. One of these is explained at length: K-Means extends Principal component analysis to the case of binary scoring factors, which yields the so-called Anomalous cluster method, a key to an intelligent version of K-Means with automated choice of the number of clusters and their initialization.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Principal Component Analysis and SVD

next chapter Hierarchical Clustering

Bandyopadhyay, S., Maulik, U.: An evolutionary technique based on K-means algorithm for optimal clustering in R^N. Inf. Sci. 146, 221–237 (2002).MathSciNetMATHCrossRef

Bezdek, J., Keller, J., Krisnapuram, R., Pal, M.: Fuzzy Models and Algorithms for Pattern Recognition and Image Processing. Kluwer Academic Publishers, Dordrecht (1999).

Cangelosi, R., Goriely, A.: Component retention in principal component analysis with application to cDNA microarray data. Biol. Direct. 2, 2 (2007). http://www.biolgy-direct.com/con-tent/2/1/2.CrossRef

Green, S.B., Salkind, N.J.: Using SPSS for the Windows and Mackintosh: Analyzing and Understanding Data. Prentice Hall, Upper Saddle River, NJ (2003).

Hartigan, J.A.: Clustering Algorithms. Wiley, New York (1975).

Kaufman. L., Rousseeuw, P.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York (1990).

Kendall, M.G., Stewart, A.: Advanced Statistics: Inference and Relationship (3d edition). Griffin, London (1973). ISBN: 0852642156.

Kettenring, J.: The practice of cluster analysis. J. Classific. 23, 3–30 (2006).MathSciNetCrossRef

Kohonen, T.: Self-Organizing Maps. Springer, Berlin (1995).CrossRef

Kryshtanowski, A.: Analysis of Sociology Data with SPSS. Higher School of Economics Publishers, Moscow (in Russian) (2008).

Lu, Y., Lu, S., Fotouhi, F., Deng, Y., Brown, S.: Incremental genetic algorithm and its application in gene expression data analysis. BMC Bioinform. 5,172 (2004).CrossRef

Ming-Tso Chiang, M., Mirkin, B.: Intelligent choice of the number of clusters in K-Means clustering: an experimental study with different cluster spreads. J. Classif. 27(1), 3–40 (2010).CrossRef

Mirkin, B.: Clustering for Data Mining: A Data Recovery Approach. Chapman & Hall/CRC, Roca Baton, FL (2005). ISBN 1-58488-534-3.

Mirkin, B.: Mathematical Classification and Clustering. Kluwer Academic Press, Boston-Dordrecht (1996).

Murthy, C.A., Chowdhury, N.: In search of optimal clusters using genetic algorithms. Pattern Recognit. Lett. 17, 825–832 (1996).

Nascimento, S., Franco, P.: Unsupervised fuzzy clustering for the segmentation and annotation of upwelling regions in sea surface temperature images. In: Gama, J. (ed.) Discovery Science, LNCS 5808, pp. 212–226. Springer (2009).

Nascimento, S.: Fuzzy Clustering via Proportional Membership Model. ISO Press, Amsterdam (2005).

Paterlini, S., Krink, T.: Differential evolution and PSO in partitional clustering. Comput. Stat. Data Anal. 50, 1220–1247 (2006).MathSciNetCrossRef

Stanforth, R., Mirkin, B., Kolossov, E.: A measure of domain of applicability for QSAR modelling based on intelligent K-Means clustering. QSAR Comb. Sci. 26(7), 837–844 (2007).CrossRef

Title: K-Means and Related Clustering Methods
Author: Boris Mirkin
Publisher: Springer London
Book: Core Concepts in Data Analysis: Summarization, Correlation and Visualization
Print ISBN: 978-0-85729-286-5

Electronic ISBN: 978-0-85729-287-2

Copyright Year: 2011
DOI: https://doi.org/10.1007/978-0-85729-287-2_6

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner