Skip to main content
Top

2018 | OriginalPaper | Chapter

5. Spectral Clustering

Authors : Sławomir T. Wierzchoń, Mieczysław A. Kłopotek

Published in: Modern Algorithms of Cluster Analysis

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This chapter discusses clustering methods based on similarities between pairs of objects. Such a knowledge does not imply that the entire objects are embedded in a metric space. Instead, the local knowledge supports a graphical representation displaying relationships among the objects from a given data set. The problem of data clustering transforms then into the problem of graph partitioning, and this partitioning is acquired by analysing eigenvectors of the graph Laplacian, a basic tool used in spectral graph theory. We explain how various forms of graph Laplacian are used in various graph partitioning criteria, and how these translate into particular algorithms. There is a strong and fascinating relationship between graph Laplacian and random walk on a graph. Particularly, it allows to formulate a number of other clustering criteria, and to formulate another data clustering algorithms. We briefly review these problems. It should be noted that the eigenvectors deliver so-called spectral representation of data items. Unfortunately, this representation is fixed for a given data set, and adding or deleting some items destroys it. Thus we discuss recently invented methods of out of sample spectral clustering allowing to overcome this disadvantage. Although spectral methods are successful in extracting non-convex groups in data, the process of forming graph Laplacian is memory consuming and computing its eigenvectors is time consuming. Thus we discuss various local methods in which only relevant part of the graph are considered. Moreover, we mention a number of methods allowing fast and approximate computation of the eigenvectors.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
I.e. time complexity of the eigen-decomposition is \(O(m^3)\) and the matrix S has in general \(O(m^2)\) entries.
 
2
The problem of finding K nearest neighbours in high dimensional spaces is far from trivial. Fortunately, there are several efficient approximate techniques allowing to solve this problem. See e.g. P. Indyk, Dimensionality reduction techniques for proximity problems. Proceedings of the 11-th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA’00, pp. 371–378. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA 2000. Another promising solution is FLANN library described in [363].
 
3
See e.g. J.A. Albano and D.W. Messinger, Euclidean commute time distance embedding and its application to spectral anomaly detection. Proceedings of SPIE 8390, Algorithms and Technologies for Multispectral, Hyperspectral, and Ultraspectral Imagery XVIII, 83902G, (May 8, 2012); https://​doi.​org/​10.​1117/​12.​918411.
 
4
In fact all these matrices can be viewed as the instances of a generalized Laplacian called also discrete Schrödinger operator: it is a matrix M with the elements \(m_{uv} < 0\) if \(\{u, v\}\) is an edge of \(\Gamma \) and \(m_{uv}=0\) if the nodes u and \(v\ne u\) are not adjacent. There are no constraints on the diagonal entries of M [76, 139].
 
5
Recall that \(\Gamma \) is a simple graph.
 
6
Formally, a square and real matrix A is said to be diagonally dominant if \(|a_{ii}| \ge \sum _{j \ne i}|a_{ij}|\), and strictly diagonally dominant if \(|a_{ii}| > \sum _{j \ne i}|a_{ij}|\), see e.g. [354].
 
7
In general this value may be a complex number, like in [415].
 
8
You can imagine that \(\Gamma \) is a graph with (arbitrarily) oriented edges, for instance the edge \(\{i, j\}\) is converted into \(i \rightarrow j\) if \(i<j\), see e.g. [354].
 
9
Hence we used the symbol \(\nabla \) to denote this matrix.
 
10
More generally, a square matrix A with complex entries is said to be hermitian, if each its entry \(a_{ij}\) is equal to the complex conjugate of the element \(a_{ji}\). In such a case the eigenvalues of A are real numbers. If all the elements of A are real numbers, “hermitian” means simply symmetric matrix.
 
11
Note that the eigenproblem, if it has a solution (\(\lambda ,\mathbf {x}\)), and as \(\mathbf {x}\) is always assumed to be non-zero, then it has infinitely many solutions of the form (\(\lambda , r \mathbf {x}\)), where r is any non-zero real number. Therefore, unless we specify otherwise, we choose among them one eigenvector of unit length, \(\mathbf {x}^\texttt {T}\mathbf {x}=1\).
 
12
Even if there are degenerate eigenvalues, it is always possible to find an orthogonal basis of \(\mathbb {R}^m\) consisting of m eigenvectors of A.
 
13
The nodes can be always rearranged so that the matrix S has an appropriate form for this ordering of eigenvalues to hold.
 
14
Note that we refrained from normalising eigenvectors of asymmeric matrices. It is due to the issue of left-handed and right-handed eigenvectors.
 
15
For instance an outlier would have a small degree as it is not close to another point.
 
16
More generally this applies to any hermitian matrix.
 
17
This dataset can be obtained e.g. from the repository maintained by Alex Arenas, http://​deim.​urv.​cat/​~alexandre.​arenas/​data/​welcome.​htm.
 
18
See e.g. R. Bhatia, Matrix Analysis, Springer, 1997.
 
19
I.e. such a matrix that \(U^\texttt {T}= U^{-1}\), or equivalently, to \(U^\texttt {T}U = \mathbb {I}\). If U is a hermitian matrix this condition extends to \(U^* = U^{-1}\), where \(U^*\) stands for the conjugate transpose of U.
 
20
See G. Eckart and G. Young, The approximation of one matrix by another of lower rank, Psychometrika 1: 211–218 (1936) and L. Mirsky, Symmetric gauge functions and unitarily invariant norms, Quart. J. Math. Oxford 1156–1159 (1966).
 
21
A norm \(\Vert \cdot \Vert \) is called unitarily invariant if \(\Vert UAV\Vert =\Vert A\Vert \) for any orthogonal pair of matrices U and V.
 
22
These terms are quoted in: G.H. Golub, M.W. Mahoney, P. Drineas, and L.-H. Lim. “Bridging the gap between numerical linear algebra, theoretical computer science, and data applications”. SIAM News, 39(8), October 2006.
 
23
The case of directed graph is considered in [142] and in [112].
 
24
\(assoc(A, B)=\sum _{\begin{array}{c} i\, \in A\\ j\, \in B \end{array}} s_{ij}\) measures the strength of association between sets A and B. Note that in an undirected graph assoc(AA) counts edge weights between nodes ij twice: once as \(s_{ij}\) and once as \(s_{ji}\). If we have two distinct sets AB, then \(assoc(A,B)=\mathsf{cut}(A, B)\).
 
25
Particularly, the matrix from Fig. 5.10d is rather block-band matrix. Its block diagonal structure can be recovered by using the method described in [178].
 
26
See Eq. (5.44).
 
27
An excellent example of this situation is the old town (medina) of Fez in Morocco. There is almost 9000 streets!.
 
28
See M. Meilă: The multicut lemma. Technical Report 417, Department of Statistics, University of Washington, 2001.
 
29
Since the minimal eigenvalue of L, \(\lambda _1=0\), this matrix is singular. Fouss et al. describe in [184] an efficient procedure for computing the pseudoinverse of L in case of large data sets.
 
30
These probabilities can be computed as \((\mathbb {I}- Q)^{-1}R\), where R is the matrix of size \(|V\backslash A| \times |A|\) obtained by deleting from P the rows corresponding to the states in A and columns corresponding to the states in \(V\backslash A\). This time the matrix Q is obtained from P by deleting rows and columns corresponding to the states in A.
 
31
I.e. \(c_{ij}\) is proportional to the effective resistance between node i and node j of the corresponding network, where a resistance \(1/s_{ij}\) is assigned to each edge. See [161] for details.
 
32
The “standard” spectral clustering can be termed 2-spectral clustering as it exploits 2-Laplacian, i.e. “standard” graph Laplacian. Consult the URL http://​www.​ml.​uni-saarland.​de/​code/​oneSpectralClust​ering/​oneSpectralClust​ering.​htm for more details, software and papers.
 
33
Its implementation in MATLAB is provided by Szlam and can be found at http://​www.​sci.​ccny.​cuny.​edu/​~szlam/​gtvc.​tar.​gz.
 
34
Usually it is a diagonal matrix. If \(W=\mathbb {I}\) the problem becomes unweighted kernel PCA.
 
35
In fact instead of L we can consider any generalized Laplacian matrix [76, 139].
 
36
This notion is discussed in next chapter.
 
37
See e.g. Watts, D.J.; Strogatz, S.H. Collective dynamics of ‘small-world’ networks. Nature 393(6684), 1998, 409–410. https://​doi.​org/​10.​1038/​30918.
 
38
Spielman i Teng add one more condition, but, to save simplicity of presentation, we omit it.
 
39
Note that \(\check{P}^j\) can be computed iteratively as \(\check{P}\check{P}^{j-1}\), \(j=2,3,\dots \). Since \(\check{P}\) is a sparse matrix, such a multiplication can be organized efficiently even in case of large graphs.
 
40
See e.g. P. Buneman, S. Khanna, W.-C. Tan: Data provenance: Some basic issues. Proc. of 20th Conf. Foundations of Software Technology and Theoretical Computer Science, FST TCS 2000, New Delhi, India. Springer 2000, pp. 87–93.
 
41
See P. Macko, D. Margo, M. Seltzer: Local clustering in provenance graphs. Proc. of the 22nd ACM Int. Conf. on Information & Knowledge Management. ACM, 2013, pp. 835–840.
 
42
B. Saha, et al. Dense subgraphs with restrictions and applications to gene annotation graphs. In Research in Computational Molecular Biology, Springer 2010, pp. 456–472.
 
43
S.E.G. Villarreal, R.F. Brena: Topic mining based on graph local clustering. In: I. Batyrshin, G. Sidorov, eds. Advances in Soft Computing. LNCS 7095, Springer 2011, pp 201–212.
 
44
They depend on whether \(S_{11}\) is positive definite.
 
45
See e.g. H. Lee, et al. Efficient sparse coding algorithms. In Proc. of the 19-th Int’l Conf. on Neural Information Processing Systems, NIPS’06, pp. 801–808, 2006, MIT Press.
 
46
Its MATLAB implementation can be found here: http://​www.​di.​ens.​fr/​~fbach/​kernel-ica/​.
 
47
In MATLAB the command [Q,R] = qr(A, 0), where the number of rows of A is larger than the number of columns, then an “economy” factorization is returned, omitting zeroes of R and the corresponding columns of Q.
 
48
See e.g.: D. Achlioptas. Database-friendly random projections: Johnson-Lindenstrauss with binary coins. J. of Comput. and Syst. Sci., 66(4):671–687, 2003.
 
49
Concise description of Tchebyshev approximation can be found e.g. in W.H. Press, S.A. Teukolsky, W.T. Vettrrling, and B.P. Flannery: Numerical Recipes in C. The Art of Scientific Computing, 2-nd Edition, Cambridge University Press, 1992.
 
Metadata
Title
Spectral Clustering
Authors
Sławomir T. Wierzchoń
Mieczysław A. Kłopotek
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-319-69308-8_5

Premium Partner