A survey of kernel and spectral methods for clustering
Introduction
Unsupervised data analysis using clustering algorithms provides a useful tool to explore data structures. Clustering methods [1], [2] have been addressed in many contexts and disciplines such as data mining, document retrieval, image segmentation and pattern classification. The aim of clustering methods is to group patterns on the basis of a similarity (or dissimilarity) criteria where groups (or clusters) are set of similar patterns. Crucial aspects in clustering are pattern representation and the similarity measure. Each pattern is usually represented by a set of features of the system under study. It is very important to notice that a good choice of representation of patterns can lead to improvements in clustering performance. Whether it is possible to choose an appropriate set of features depends on the system under study. Once a representation is fixed it is possible to choose an appropriate similarity measure among patterns. The most popular dissimilarity measure for metric representations is the distance, for instance the Euclidean one [3].
Clustering techniques can be roughly divided into two categories:
- •
hierarchical;
- •
partitioning.
Partitioning clustering methods try to obtain a single partition of data without any other sub-partition like hierarchical algorithms do and are often based on the optimization of an appropriate objective function. The result is the creation of separations hypersurfaces among clusters. For instance we can consider two nonlinear clusters as in Fig. 1. Standard partitioning methods (e.g., K-means, fuzzy -means, SOM and neural gas) using two centroids are not able to separate in the desired way the two rings. The use of many centroids could solve this problem providing a complex description of a simple data set. For this reason several modifications and new approaches have been introduced to cope with this problem.
Among the large amount of modifications we can mention the fuzzy -varieties [6], but the main drawback is that some a priori information on the shape of clusters must be included. Recently, some clustering methods that produce nonlinear separating hypersurfaces among clusters have been proposed. These algorithms can be divided in two big families: kernel and spectral clustering methods.
Regarding kernel clustering methods, several clustering methods have been modified incorporating kernels (e.g., K-means, fuzzy -means, SOM and neural gas). The use of kernels allows to map implicitly data into an high dimensional space called feature space; computing a linear partitioning in this feature space results in a nonlinear partitioning in the input space.
Spectral clustering methods arise from concepts in spectral graph theory. The basic idea is to construct a weighted graph from the initial data set where each node represents a pattern and each weighted edge simply takes into account the similarity between two patterns. In this framework the clustering problem can be seen as a graph cut problem, which can be tackled by means of the spectral graph theory. The core of this theory is the eigenvalue decomposition of the Laplacian matrix of the weighted graph obtained from data. In fact, there is a close relationship between the second smallest eigenvalue of the Laplacian and the graph cut [7], [8].
The aim of this paper is to present a survey of kernel and spectral clustering methods. Moreover, an explicit proof of the fact that these two approaches have the same mathematical foundation is reported. In particular it has been shown by Dhillon et al. that kernel K-means and spectral clustering with the ratio association as the objective function are perfectly equivalent [9], [10], [11]. The core of both approaches lies in their ability to construct an adjacency structure between data avoiding to deal with a prefixed shape of clusters. These approaches have a slight similarity with hierarchical methods in the use of an adjacency structure with the main difference in the philosophy of the grouping procedure.
A comparison of some spectral clustering methods has been recently proposed in Ref. [12], while there are some theoretical results on the capabilities and convergence properties of spectral methods for clustering [13], [14], [15], [16]. Recently, kernel methods have been applied to fuzzy -varieties also [17] with the aim of finding varieties in feature space and there are some interesting clustering methods using kernels such as Refs. [18], [19].
Since the choice of the kernel and of the similarity measure is crucial in these methods, many techniques have been proposed in order to learn automatically the shape of kernels from data as in Refs. [20], [21], [22], [23].
Regarding the applications, most of these algorithms (e.g., Refs. [17], [21], [24]) have been applied to standard benchmarks such as Ionosphere [25], Breast Cancer [26] and Iris [27].1 Kernel fuzzy -means proposed in Refs. [28], [29], [30] has been applied in image segmentation problems while in Ref. [31] it has been applied in handwritten digits recognition. There are applications of kernel clustering methods in face recognition using kernel SOM [32], in speech recognition [33] and in prediction of crop yield from climate and plantation data [34]. Spectral methods have been applied in clustering of artificial data [35], [36], in image segmentation [23], [37], [38], in bioinformatics [39], and in co-clustering problems of words and documents [40] and genes and conditions [41]. A semi-supervised spectral approach to bioinformatics and handwritten character recognition have been proposed in Ref. [42]. The protein sequence clustering problem has been faced using spectral techniques in Ref. [43] and kernel methods in Ref. [44].
In the next section we briefly introduce the concepts of linear partitioning methods by recalling some basic crisp and fuzzy algorithms. Then the paper is organized as follows: Section 3 shows the kernelized version of the algorithms presented in Section 2, in Section 4 we discuss spectral clustering, while in Section 5 we report the equivalence between spectral and kernel clustering methods. In the last section conclusions are drawn.
Section snippets
Partitioning methods
In this section we briefly recall some basic facts about partitioning clustering methods and we will report the clustering methods for which a kernel version has been proposed. Let be a data set composed by patterns for which every . The codebook (or set of centroids) is defined as the set , typically with . Each element is called codevector (or centroid or prototype).2
Kernel clustering methods
In machine learning, the use of the kernel functions [59] has been introduced by Aizerman et al. [60] in 1964. In 1995 Cortes and Vapnik introduced support vector machines (SVMs) [61] which perform better than other classification algorithms in several problems. The success of SVM has brought to extend the use of kernels to other learning algorithms (e.g., Kernel PCA [62]). The choice of the kernel is crucial to incorporate a priori knowledge on the application, for which it is possible to
Spectral clustering
Spectral clustering methods [39] have a strong connection with graph theory [7], [83]. A comparison of some spectral clustering methods has been recently proposed in Ref. [12]. Let be the set of patterns to cluster. Starting from , we can build a complete, weighted undirected graph having a set of nodes corresponding to the patterns and edges defined through the adjacency (also affinity) matrix . The adjacency matrix for a weighted graph is given by the
A unified view of spectral and kernel clustering methods
Recently, a possible connection between unsupervised kernel algorithms and spectral methods has been studied to find whether these two seemingly different approaches can be described under a more general framework. The hint for this unifying theory lies the adjacency structure constructed by both these approaches. In the spectral approach there is an adjacency between patterns which is the analogous of the kernel functions in kernel methods.
A direct connection between Kernel PCA and spectral
Conclusions
Clustering is a classical problem in pattern recognition. Recently spectral and kernel methods for clustering have provided new ideas and interpretations to the solution of this problem. In this paper spectral and kernel methods for clustering have been reviewed paying attention to fuzzy kernel methods for clustering and to the connection between kernel and spectral approaches. Unlike classical partitioning clustering algorithms they are able to produce nonlinear separating hypersurfaces among
Acknowledgments
This work was partially supported by the Italian Ministry of Education, University, and Research (PRIN 2004 Project 2004062740) and by the University of Genova.
About the Author—MAURIZIO FILIPPONE received a Laurea degree in Physics in 2004 from the University of Genova and now he is pursuing a Ph.D. in Computer Science at University of Genova. His research interests are focused on Machine Learning techniques and their applications to Bioinformatics and to time series analysis and forecasting.
References (94)
Fuzzy -varieties/elliptotypes clustering in reproducing kernel hilbert space
Fuzzy Sets and Systems
(2004)Clustering via Hilbert space
Phys. A Stat. Mech. Appl.
(2001)- et al.
A novel kernelized fuzzy -means algorithm with application in medical image segmentation
Artif. Intell. Med.
(2004) - et al.
Support vector domain description
Pattern Recognition Lett.
(1999) - et al.
Data clustering: a review
ACM Comput. Surveys
(1999) - et al.
Survey of clustering algorithms
IEEE Trans. Neural Networks
(2005) - et al.
Pattern Classification and Scene Analysis
(1973) - et al.
Numerical Taxonomy: The Principles and Practice of Numerical Classification
(1973) Hierarchical grouping to optimize an objective function
J. Am. Stat. Assoc.
(1963)Pattern Recognition with Fuzzy Objective Function Algorithms
(1981)
Algebraic connectivity of graphs
Czechoslovak Math. J.
Kernel -means: spectral clustering and normalized cuts
Limits of spectral clustering
Clustering via kernel decomposition
IEEE Trans. Neural Networks
On kernel-target alignment
A novel kernel method for clustering
IEEE Trans. Pattern Anal. Mach. Intell.
Classification of radar returns from the ionosphere using neural networks
Johns Hopkins APL Tech. Digest
Multisurface method of pattern separation for medical diagnosis applied to breast cytology
Proc. Natl. Acad. Sci. USA
The use of multiple measurements in taxonomic problems
Ann. Eugenics
Robust image segmentation using FCM with spatial constraints based on new kernel-induced distance measure
IEEE Trans. Systems Man Cybernet. B
On spectral clustering: analysis and an algorithm
Normalized cuts and image segmentation
IEEE Trans. Pattern Anal. Mach. Intell.
Spectral kernel methods for clustering
Co-clustering documents and words using bipartite spectral graph partitioning
Spectral biclustering of microarray data: coclustering genes and conditions
Genome Res.
Semi-supervised graph clustering: a kernel approach
Semi-supervised protein classification using cluster kernels
Bioinformatics
An algorithm for vector quantizer design
IEEE Trans. Commun.
Least squares quantization in pcm
IEEE Trans. Inf. Theory
Cited by (710)
On the moisture transport regimes for extreme precipitation over North China
2024, Atmospheric ResearchGeometric-inspired graph-based Incomplete Multi-view Clustering
2024, Pattern RecognitionLinking timescale-dependent Antarctic sea ice kinematic observations to ice thickness
2023, Remote Sensing of EnvironmentSimple multiple kernel k-means with kernel weight regularization
2023, Information FusionFuzzy clustering method with approximate orthogonal regularization
2023, Applied Soft ComputingGaussian kernel fuzzy c-means with width parameter computation and regularization
2023, Pattern Recognition
About the Author—MAURIZIO FILIPPONE received a Laurea degree in Physics in 2004 from the University of Genova and now he is pursuing a Ph.D. in Computer Science at University of Genova. His research interests are focused on Machine Learning techniques and their applications to Bioinformatics and to time series analysis and forecasting.
About the Author—FRANCESCO CAMASTRA received Laurea degree in Physics in 1985 from University of Milan and the Ph.D. degree in Computer Science from University of Genova in 2004. He was the recipient of Eduardo Caianiello Award 2005 and since February 2006 he has been Assistant Professor at University of Napoli Parthenope. His research interests are Machine Learning and Kernel Methods.
About the Author—FRANCESCO MASULLI received the Laurea degree in Physics in 1976 from the University of Genova. He is currently an Associate Professor in Computer Science at the University of Genova. From 1983 to 2001 he has been an Assistant Professor at the University of Genova and from 2001 to 2005 an Associate Professor at University of Pisa. He authored or co-authored more than 120 scientific papers in Machine Learning, Neural Networks, Fuzzy Systems, Pattern Recognition and Bioinformatics.
About the Author—STEFANO ROVETTA received a Laurea degree in Electronic Engineering in 1992 and a Ph.D. degree in Models, Methods and Tools for Electronic and Electromagnetic Systems in 1996, both from the University of Genova. He is currently an Assistant Professor in Computer Science at the University of Genova. His current research interests include Machine Learning and Clustering techniques for high-dimensional biomedical data analysis and document analysis.