Skip to main content

2018 | OriginalPaper | Buchkapitel

2. Cluster Analysis

verfasst von : Sławomir T. Wierzchoń, Mieczysław A. Kłopotek

Erschienen in: Modern Algorithms of Cluster Analysis

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This chapter outlines the major steps of cluster analysis. It starts with an informal introduction to clustering, its tools, methodology and applications. Then it proceeds with formalising the problem of data clustering. Diverse measures of object similarity, based on quantitative (like numerical measurement results) and on qualitative features (like text) as well as on their mixtures, are described. Various variants, how such similarity measures can be exploited when defining clustering cost functions are presented. Afterwards, major brands of clustering algorithms are explained, including hierarchical, partitional, relational, graph-based, density-based and kernel methods. The underlying data representations and interrelationships between various methodologies are discussed. Also possibilities of combining several algorithms for analysis of the same data (ensemble clustering) are presented. Finally, the issues related to easiness/hardness of the data clustering tasks are recalled.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
The requirement of disjoint subsets is used in the classical data analysis. In the general case, the groups distinguished might constitute the coverage of the data set. This is the case, for instance, in the fuzzy data analysis.
 
2
Many of those listed have been modified several times over and successive editions have been published.
 
3
Watanabe, Satosi (1969). Knowing and Guessing: A Quantitative Study of Inference and Information. New York: Wiley, pp. 376–377.
 
4
The latter term is particularly justified, when we treat the measurements as mappings \(\mathbf {f} :\mathfrak {X} \rightarrow \mathbb {R}^n\) of the set of objects into a certain set of values. Then, \(\mathbf {x}_i=\mathbf {f}(\mathfrak {x}_i)\), and in the mathematical nomenclature \(\mathbf {x}_i\) is the image of the object \(\mathfrak {x}_i\).
 
5
See J. Gajek. Jan Czekanowski. Sylwetka uczonego. Nauka Polska, 6(2), 1958, 118–127.
 
6
See, e.g., A. Soltysiak, and P. Jaskulski. Czekanowski’s Diagram: A method of multidimensional clustering. In: J.A. Barceló, I. Briz and A. Vila (eds.) New Techniques for Old Times. CAA98. Computer Applications and Quantitative Methods in Archaeology. Proc. of the 26th Conf., Barcelona, March 1998 (BAR International Series 757). Archaeopress, Oxford 1999, pp. 175–184.
 
7
See, e.g., A. Wójcik. Zastosowanie diagramu Czekanowskiego do badania podobieństwa krajów Unii Europejskiej pod wzglȩdem pozyskiwania energii ze źródeł odnawialnych. Zarza̧dzanie i Finanse (J. of Management and Finance), 11(4/4), 353–365, 2013, http://​zif.​wzr.​pl/​pim/​2013_​4_​4_​25.​pdf.
 
8
Note that, as Gower et al. [209] states, see their Theorem 1, any non-metric dissimilarity measure \( d(\mathfrak {z}, \mathfrak {y})\) for \(\mathfrak {z}, \mathfrak {y}\in {\mathfrak {X}}\) where \( {\mathfrak {X}}\) is finite, can be turned into a (metric) distance function \( d'(\mathfrak {z}, \mathfrak {y})= d(\mathfrak {z}, \mathfrak {y})+c \) where c is a constant where \(c\ge \max _{\mathfrak {x}, \mathfrak {y}, \mathfrak {z}\in \mathfrak {X}} \Vert d(\mathfrak {x}, \mathfrak {y}) +d(\mathfrak {y}, \mathfrak {z})-d(\mathfrak {z}, \mathfrak {x}) \Vert .\)
 
9
In practice, a small number means one that is small compared to distances but still numerically significant under the available machine precision.
 
10
A real-valued function f(x) is said to be convex over the interval \([a, b]\subset \mathbb {R}\) if for any \(x_1,x_2\in [a, b]\) and any \(\lambda \in [0,1]\) we have \(\lambda f(x_1)+(1-\lambda )f(x_2)\ge f(\lambda x_1+(1-\lambda ) x_2)\). All three mentioned examples of transformation of a distance into similarity measure are in fact convex functions.
 
11
In Chaps. 5 and 6 we will predominantly concentrate on similarity measure based clustering methods.
 
12
In Sect. 6.​2 we discuss some similarity measures defined for data in form of graphs/networks.
 
13
Note that the Euclidean distance has been investigated itself to a great depth, see e.g. [209]. See Sect. B.5 for a discussion of criteria of a dissimilarity matrix being Euclidean distance matrix. Gower et al. [209] points out that any dissimilarity matrix D may be turned to an Euclidean distance matrix, see their Theorem 7, by adding an appropriate constant, e.g. \( d'(\mathfrak {z}, \mathfrak {y})= \sqrt{ d(\mathfrak {z}, \mathfrak {y})^2+h }\) where h is a constant such that \(h\ge -\lambda _m\), \(\lambda _m\) being the smallest eigenvalue of \((\mathbf {I}-\mathbf {1}\mathbf {1}^T/m) (-1/2 D_{sq}) (\mathbf {I}-\mathbf {1}\mathbf {1}^T/m)\), \(D_{sq}\) is the matrix of squared values of elements of D, m is the number of rows/columns in D.
 
14
See Chap. 3 in: C.H. Coombs, R.M. Dawes, A. Tversky. Mathematical Psychology: An Elementary Introduction. Prentice Hall, Englewood Cliffs, NJ 1970.
 
15
His profile can be found on the website http://​www.​piethein.​com/​.
 
16
Equivalently, one can say that two arbitrary vectors in \(\mathbb {R}^n\) are orthogonal [402, p. 7.1.3].
 
17
D.L. Chaffee, Applications of rate distortion theory to the bandwidth compression of speech, Ph.D. dissertation, Univ. California, Los Angeles, 1975. See also R.M. Gray, et al., Distortion measures for speech processing, IEEE Trans. on Acoustics, Speech and Signal Processing, 28(4), 367–376, Aug. 1980.
 
18
See F. Itakura, S. Saito, Analysis synthesis telephony based upon maximum likelihood method, Repts. of the 6th Intl. Cong. Acoust. Tokyo, C-5-5, C-17-20, 1968.
 
19
That is—all of its components are non-negative and \(\sum _{j=1}^n x_j=1\).
 
20
Note that Bregman divergence is in general neither symmetric nor fits the triangle condition hence it is not a metric distance. However, there exist some families of Bregman distances, like Generalized Symmetrized Bregman and Jensen-Bregman divergence, which under some conditiomns are a square of a metric distance, and may be even embedded in Euclidean space. For details see Acharyya, S., Banerjee, A., and Boley, D. (2013). Bregman Divergences and Triangle Inequality. In SDM’13 (SIAM International Conference on Data Mining), pp. 476–484.
 
21
See e.g. Web-based handbook of statistics. Cluster analysis: Agglomeration. http://​www.​statsoft.​pl/​textbook/​stathome.​html.
 
22
E.g. in medical tests it is often assumed that lack of a given feature for a patient is denoted by symbol 0, while its presence—by the symbol 1. In such situations it is better not to account in the comparisons for the number of zeroes.
 
23
see, e.g., J. Mazerski. Podstawy chemometrii, Gdańsk 2004. Electronic edition available at http://​www.​pg.​gda.​pl/​chem/​Katedry/​Leki_​Biochemia/​dydaktyka/​chemometria/​podstawy_​chemometrii.​zip.
 
25
See V. Asha, N.U. Bhajantri, and P. Nagabhushan: GLCM-based chi-square histogram distance for automatic detection of defects on patterned textures. Int. J. of Computational Vision and Robotics, 2(4), 302–313, 2011.
 
26
An extensive overview of distance measures is provided in the Encyclopedia of Distances by Michel Marie Deza and Elena Deza, Springer 2009, http://​www.​uco.​es/​users/​ma1fegan/​Comunes/​asignaturas/​vision/​Encyclopedia-of-distances-2009.​pdf.
 
27
M. Delattre and P. Hansen. “Bicriterion cluster analysis”, IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol-2, No. 4, pp. 277–291, 1980.
 
28
R.R. Sokal, F.J. Rohlf. 1962. The comparison of dendrograms by objective methods. Taxon, 11(2), 1962, 33–40.
 
29
Both distance matrices are symmetric, it suffices, therefore, to consider their elements situated above (or below) the main diagonal.
 
30
L.A. Goodman, W.H. Kruskal. Measures of association for cross classifications. J. of the American Statistical Association, 49(268), 1954, 732–764.
 
31
For computational reasons, instead of distance, its squared value is often used, allowing for omitting the square root operation.
 
32
In graph theory, a clique is such a subgraph, in which any two vertices are connected by an edge. Admitting that we connect with an edge the vertices that are little dissimilar, we treat a clique as a set of mutually similar vertices.
 
33
Peng and Wei note in [390] that the idea of representing the objective function appearing in the problem (2.33) in the form of minimisation of the trace of an appropriate matrix was forwarded first by A.D. Gordon and J.T. Henderson in their paper, entitled “An algorithm for Euclidean sum of squares”, which appeared in the 33rd volume of the journal Biometrica (year 1977, pp. 355–362). Gordon and Henderson, though, wrote that this idea had been suggested to them by the anonymous referee!.
 
34
See Sect. 2.5.7.
 
35
See Definition B.2.4 in p. 320.
 
36
We take advantage here of the fact that \(\Vert A\Vert _F^2 = \text {tr}\,(A' A)\).
 
37
See, e.g., J. Hertz, A. Krogh, R.G. Palmer: Introduction to the Theory of Neural Computation. Santa Fe Institute Series, Addison-Wesley, 1991.
 
38
Georgiy Fedosiyevich Voronoi, whose name appears in the Voronoi diagrams, was a Russian mathematician of Ukrainian extraction. He lived in the years 1868–1908. Interesting information on this subject can be found in 17th chapter of popular book by Ian Stewart, entitled Cows in the Maze. And other mathematical explorations, published in 2010 by OUP Oxford.
 
39
E. Segal, A. Battle, and D. Koller. Decomposing gene expression into cellular processes. In: Proc. of the 8th Pacific Symp. on Biocomputing (PSB), 2003, pp. 89–100.
 
40
G. Cleuziou. A generalization of k-means for overlapping clustering. Université d’Orléans, LIFO, Rapport No RR-2007-15.
 
41
See, e.g., W. Didimo, F. Giordano, G. Liotta. Overlapping cluster planarity. In: Proc. APVIS 2007, pp. 73–80, M. Fellows, J. Guo, C. Komusiewicz, R. Niedermeier, J. Uhlmann. Graph-based data clustering with overlaps, COCOON, 2009, pp. 516–526.
 
42
Also one can proceed in the reverse direction: One may use a partitional algorithm to create a hierarchical clustering of data, simply by applying recursively the partitional algorithm to clusters obtained previously, like in case of bi-sectional k-means algorithm in Sect. 3.​1.​5.​2.
 
43
C.T. Zahn, Graph-theoretic methods for detecting and describing gestalt clusters. IEEE Trans. Comput., 20(1):68–86, 1971.
 
44
which means that objects, belonging to the same class are mutually sufficiently similar, while objects, belonging to different groups are sufficiently mutually dissimilar.
 
45
It arose from adding 14 randomly generated points to the set, described in Chap. 7, namely data3_2.
 
46
An introduction of Kernel functions should start with definition of a vector space. Let V be a set, \(\oplus :V \times V \rightarrow V\) be so-called inner operator (or vector addition) and \(\odot :\mathbb {R} \times V \rightarrow V\) be so-called outer operator (or scalar vector multiplication). Then \((V,\oplus ,\odot )\) is called vector space over the real numbers if the following properties hold: \(u \oplus (v \oplus w) = (u \oplus v ) \oplus w\), there exists \(0_v\in V\) such that \(v \oplus 0_V = 0_V \oplus v = v\), \(v \oplus u = u \oplus v\), \(\alpha \odot (u \oplus v) = (\alpha \odot u) \oplus (\alpha \odot v)\), \((\alpha + \beta ) \odot v = (\alpha \odot v) \oplus (\beta \odot v)\), \((\alpha \cdot \beta ) \odot v = \alpha \odot (\beta \odot v)\), \( 1 \odot v = v\).
Given the vector space, one can define the inner product space as a vector space V over real numbers in which the scalar product \(\langle \cdot ,\cdot \rangle :V \times V \rightarrow {\mathbb R} \) \(\langle {v},{v}\rangle \ge 0\), \(\langle {v},{v}\rangle = 0 \Leftrightarrow {v} = {0}\), \(\langle {u},{v}\rangle = {\langle {v},{u}\rangle }\), \(\langle {u},\lambda \odot {v}\rangle = \lambda \cdot {\langle {u},{v}\rangle }\), \(\langle {u},{v}+{w} \rangle = \langle {u},{v} \rangle + \langle {u},{w} \rangle \),
Now let X be a the space of our input data. A mapping \(K : X \times X \rightarrow \mathbb {R}\) is called a kernel, if there exists an inner product space \((F,\langle \cdot , \cdot \rangle )\) (F being called the feature space) and a mapping \(\Phi : X \rightarrow F\) such that in this inner product space \(K(x, y) = \langle \Phi (x), \Phi (y) \rangle \) for all \( x, y \in X\). As for inner product \(\langle \Phi (x), \Phi (y) \rangle =\langle \Phi (y), \Phi (x) \rangle \) holds, obviously \(K(x,y)=K(y, x)\).
Mercel has shown that if \( \int _{x\in X} \int _{y\in X} K^2(x, y) dxdy <+\infty \) (compactness of K) and for each function \(f:X\rightarrow \mathbb {R}\) \( \int _{x\in X} \int _{y\in X} K(x, y)f(x)f(y)dxdy \ge 0 \) (semipositive-definiteness of K) then there exists a sequence of non-negative real numbers (eigenvalues) \(\lambda _1, \lambda _2,\ldots \) and a sequence of functions \( \phi _1,\phi _2,\ldots : X\rightarrow \mathbb {R}\) such that \(K(x, y)=\sum _{i=1}^\infty \lambda _i \phi _i(x) \phi _i(y)\), where the sum on the right side is absolute convergent. Moreover \(\int _{x\in X} \phi _i(x)\phi _j(x)dx \) is equal 1 if \(i=j\) and equal 0 otherwise.
Obviously, the function \(\phi \) may be of the form of an infinite vector \(\Phi =(\sqrt{\lambda _1}\phi _1, \sqrt{\lambda _2}\phi _2,\ldots )\).
The above kernel definition constitutes a special application of this general formula to the case of a finite set X. The function K over a finite set X can be represented in such a case as a matrix, which must be therefore semipositive definite and the function \(\phi \) can be expressed for each \(x\in X\) as the vector of corresponding eigenvector components multiplied with square root of the respective eigenvalues.
 
47
The mountain algorithm is a fast algorithm for determining approximate locations of centroids. See R.R. Yager and D.P. Filev. Approximate clustering via the mountain method. IEEE Trans. on Systems, Man and Cybernetics, 24(1994), 1279–1284.
 
48
Information on this data set is provided in Chap. 7.
 
49
This algorithm is presented in Sect. 3.​1.​5.​4 of this book.
 
51
It is, actually, the approximate average length of the random vector, having exactly this distribution. If \(\mathbf {x}\) is a vector, having as coordinates random numbers distributed according to \(N(0, \sigma )\), then its expected length is \(\mathbb {E}(\Vert \mathbf {x}\Vert ) = \sigma \sqrt{2} \Gamma ((n+1)/2)/\Gamma (n/2)\). In an approximation, \(\mathbb {E}(\Vert \mathbf {x}\Vert ) \approx \sigma \sqrt{2}[1 - 1/(4n) + 1/(21n^2)]\), and so \(\mathbb {E}(\Vert \mathbf {x}\Vert ) \rightarrow \sigma \sqrt{2}\) when \(n \rightarrow \infty \).
 
52
Recall, that on p. 12 we have listed a number of requirements for the separation notion in general, and on pp. 35, 28, and 33 we have already mentioned a couple of other separation measures, intended for more general types of probability distributions.
 
53
This algorithm is commented upon in Sect. 3.​2.
 
Metadaten
Titel
Cluster Analysis
verfasst von
Sławomir T. Wierzchoń
Mieczysław A. Kłopotek
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-69308-8_2

Premium Partner