Elsevier

Pattern Recognition

Volume 41, Issue 1, January 2008, Pages 176-190
Pattern Recognition

A survey of kernel and spectral methods for clustering

https://doi.org/10.1016/j.patcog.2007.05.018Get rights and content

Abstract

Clustering algorithms are a useful tool to explore data structures and have been employed in many disciplines. The focus of this paper is the partitioning clustering problem with a special interest in two recent approaches: kernel and spectral methods. The aim of this paper is to present a survey of kernel and spectral clustering methods, two approaches able to produce nonlinear separating hypersurfaces between clusters. The presented kernel clustering methods are the kernel version of many classical clustering algorithms, e.g., K-means, SOM and neural gas. Spectral clustering arise from concepts in spectral graph theory and the clustering problem is configured as a graph cut problem where an appropriate objective function has to be optimized. An explicit proof of the fact that these two paradigms have the same objective is reported since it has been proven that these two seemingly different approaches have the same mathematical foundation. Besides, fuzzy kernel clustering methods are presented as extensions of kernel K-means clustering algorithm.

Introduction

Unsupervised data analysis using clustering algorithms provides a useful tool to explore data structures. Clustering methods [1], [2] have been addressed in many contexts and disciplines such as data mining, document retrieval, image segmentation and pattern classification. The aim of clustering methods is to group patterns on the basis of a similarity (or dissimilarity) criteria where groups (or clusters) are set of similar patterns. Crucial aspects in clustering are pattern representation and the similarity measure. Each pattern is usually represented by a set of features of the system under study. It is very important to notice that a good choice of representation of patterns can lead to improvements in clustering performance. Whether it is possible to choose an appropriate set of features depends on the system under study. Once a representation is fixed it is possible to choose an appropriate similarity measure among patterns. The most popular dissimilarity measure for metric representations is the distance, for instance the Euclidean one [3].

Clustering techniques can be roughly divided into two categories:

  • hierarchical;

  • partitioning.

Hierarchical clustering techniques [1], [4], [5] are able to find structures which can be further divided in substructures and so on recursively. The result is a hierarchical structure of groups known as dendrogram.

Partitioning clustering methods try to obtain a single partition of data without any other sub-partition like hierarchical algorithms do and are often based on the optimization of an appropriate objective function. The result is the creation of separations hypersurfaces among clusters. For instance we can consider two nonlinear clusters as in Fig. 1. Standard partitioning methods (e.g., K-means, fuzzy c-means, SOM and neural gas) using two centroids are not able to separate in the desired way the two rings. The use of many centroids could solve this problem providing a complex description of a simple data set. For this reason several modifications and new approaches have been introduced to cope with this problem.

Among the large amount of modifications we can mention the fuzzy c-varieties [6], but the main drawback is that some a priori information on the shape of clusters must be included. Recently, some clustering methods that produce nonlinear separating hypersurfaces among clusters have been proposed. These algorithms can be divided in two big families: kernel and spectral clustering methods.

Regarding kernel clustering methods, several clustering methods have been modified incorporating kernels (e.g., K-means, fuzzy c-means, SOM and neural gas). The use of kernels allows to map implicitly data into an high dimensional space called feature space; computing a linear partitioning in this feature space results in a nonlinear partitioning in the input space.

Spectral clustering methods arise from concepts in spectral graph theory. The basic idea is to construct a weighted graph from the initial data set where each node represents a pattern and each weighted edge simply takes into account the similarity between two patterns. In this framework the clustering problem can be seen as a graph cut problem, which can be tackled by means of the spectral graph theory. The core of this theory is the eigenvalue decomposition of the Laplacian matrix of the weighted graph obtained from data. In fact, there is a close relationship between the second smallest eigenvalue of the Laplacian and the graph cut [7], [8].

The aim of this paper is to present a survey of kernel and spectral clustering methods. Moreover, an explicit proof of the fact that these two approaches have the same mathematical foundation is reported. In particular it has been shown by Dhillon et al. that kernel K-means and spectral clustering with the ratio association as the objective function are perfectly equivalent [9], [10], [11]. The core of both approaches lies in their ability to construct an adjacency structure between data avoiding to deal with a prefixed shape of clusters. These approaches have a slight similarity with hierarchical methods in the use of an adjacency structure with the main difference in the philosophy of the grouping procedure.

A comparison of some spectral clustering methods has been recently proposed in Ref. [12], while there are some theoretical results on the capabilities and convergence properties of spectral methods for clustering [13], [14], [15], [16]. Recently, kernel methods have been applied to fuzzy c-varieties also [17] with the aim of finding varieties in feature space and there are some interesting clustering methods using kernels such as Refs. [18], [19].

Since the choice of the kernel and of the similarity measure is crucial in these methods, many techniques have been proposed in order to learn automatically the shape of kernels from data as in Refs. [20], [21], [22], [23].

Regarding the applications, most of these algorithms (e.g., Refs. [17], [21], [24]) have been applied to standard benchmarks such as Ionosphere [25], Breast Cancer [26] and Iris [27].1 Kernel fuzzy c-means proposed in Refs. [28], [29], [30] has been applied in image segmentation problems while in Ref. [31] it has been applied in handwritten digits recognition. There are applications of kernel clustering methods in face recognition using kernel SOM [32], in speech recognition [33] and in prediction of crop yield from climate and plantation data [34]. Spectral methods have been applied in clustering of artificial data [35], [36], in image segmentation [23], [37], [38], in bioinformatics [39], and in co-clustering problems of words and documents [40] and genes and conditions [41]. A semi-supervised spectral approach to bioinformatics and handwritten character recognition have been proposed in Ref. [42]. The protein sequence clustering problem has been faced using spectral techniques in Ref. [43] and kernel methods in Ref. [44].

In the next section we briefly introduce the concepts of linear partitioning methods by recalling some basic crisp and fuzzy algorithms. Then the paper is organized as follows: Section 3 shows the kernelized version of the algorithms presented in Section 2, in Section 4 we discuss spectral clustering, while in Section 5 we report the equivalence between spectral and kernel clustering methods. In the last section conclusions are drawn.

Section snippets

Partitioning methods

In this section we briefly recall some basic facts about partitioning clustering methods and we will report the clustering methods for which a kernel version has been proposed. Let X={x1,,xn} be a data set composed by n patterns for which every xiRd. The codebook (or set of centroids) V is defined as the set V={v1,,vc}, typically with cn. Each element viRd is called codevector (or centroid or prototype).2

Kernel clustering methods

In machine learning, the use of the kernel functions [59] has been introduced by Aizerman et al. [60] in 1964. In 1995 Cortes and Vapnik introduced support vector machines (SVMs) [61] which perform better than other classification algorithms in several problems. The success of SVM has brought to extend the use of kernels to other learning algorithms (e.g., Kernel PCA [62]). The choice of the kernel is crucial to incorporate a priori knowledge on the application, for which it is possible to

Spectral clustering

Spectral clustering methods [39] have a strong connection with graph theory [7], [83]. A comparison of some spectral clustering methods has been recently proposed in Ref. [12]. Let X={x1,,xn} be the set of patterns to cluster. Starting from X, we can build a complete, weighted undirected graph G(V,A) having a set of nodes V={v1,,vn} corresponding to the n patterns and edges defined through the n×n adjacency (also affinity) matrix A. The adjacency matrix for a weighted graph is given by the

A unified view of spectral and kernel clustering methods

Recently, a possible connection between unsupervised kernel algorithms and spectral methods has been studied to find whether these two seemingly different approaches can be described under a more general framework. The hint for this unifying theory lies the adjacency structure constructed by both these approaches. In the spectral approach there is an adjacency between patterns which is the analogous of the kernel functions in kernel methods.

A direct connection between Kernel PCA and spectral

Conclusions

Clustering is a classical problem in pattern recognition. Recently spectral and kernel methods for clustering have provided new ideas and interpretations to the solution of this problem. In this paper spectral and kernel methods for clustering have been reviewed paying attention to fuzzy kernel methods for clustering and to the connection between kernel and spectral approaches. Unlike classical partitioning clustering algorithms they are able to produce nonlinear separating hypersurfaces among

Acknowledgments

This work was partially supported by the Italian Ministry of Education, University, and Research (PRIN 2004 Project 2004062740) and by the University of Genova.

About the Author—MAURIZIO FILIPPONE received a Laurea degree in Physics in 2004 from the University of Genova and now he is pursuing a Ph.D. in Computer Science at University of Genova. His research interests are focused on Machine Learning techniques and their applications to Bioinformatics and to time series analysis and forecasting.

References (94)

  • F.R.K. Chung, Spectral Graph Theory, CBMS Regional Conference Series in Mathematics, vol. 92, American Mathematical...
  • M. Fiedler

    Algebraic connectivity of graphs

    Czechoslovak Math. J.

    (1973)
  • I.S. Dhillon, Y. Guan, B. Kulis, Weighted graph cuts without eigenvectors: a multilevel approach, IEEE Trans. Pattern...
  • I.S. Dhillon, Y. Guan, B. Kulis, A unified view of kernel k-means, spectral clustering and graph partitioning,...
  • I.S. Dhillon et al.

    Kernel k-means: spectral clustering and normalized cuts

  • D. Verma, M. Meila, A comparison of spectral clustering algorithms, Technical Report, Department of CSE University of...
  • R. Kannan, S. Vempala, A. Vetta, On clusterings: good, bad, and spectral, in: Proceedings of the 41st Annual Symposium...
  • U. von Luxburg, M. Belkin, O. Bousquet, Consistency of spectral clustering, Technical Report 134, Max Planck Institute...
  • U. von Luxburg et al.

    Limits of spectral clustering

  • H. Zha, X. He, C.H.Q. Ding, M. Gu, H.D. Simon, Spectral relaxation for k-means clustering, in: NIPS, 2001, pp....
  • A.S. Have et al.

    Clustering via kernel decomposition

    IEEE Trans. Neural Networks

    (2006)
  • F.R. Bach, M.I. Jordan, Learning spectral clustering, Technical Report UCB/CSD-03-1249, EECS Department, University of...
  • N. Cristianini et al.

    On kernel-target alignment

  • I. Fischer, I. Poland, New methods for spectral clustering, Technical Report IDSIA-12-04, IDSIA,...
  • M. Meila, J. Shi, Learning segmentation by random walks, in: NIPS, 2000, pp....
  • F. Camastra et al.

    A novel kernel method for clustering

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2005)
  • V.G. Sigillito et al.

    Classification of radar returns from the ionosphere using neural networks

    Johns Hopkins APL Tech. Digest

    (1989)
  • W.H. Wolberg et al.

    Multisurface method of pattern separation for medical diagnosis applied to breast cytology

    Proc. Natl. Acad. Sci. USA

    (1990)
  • R.A. Fisher

    The use of multiple measurements in taxonomic problems

    Ann. Eugenics

    (1936)
  • S.-C. Chen et al.

    Robust image segmentation using FCM with spatial constraints based on new kernel-induced distance measure

    IEEE Trans. Systems Man Cybernet. B

    (2004)
  • D.-Q. Zhang, S.-C. Chen, Z.-S. Pan, K.-R. Tan, Kernel-based fuzzy clustering incorporating spatial constraints for...
  • T. Graepel, K. Obermayer, Fuzzy topographic kernel clustering, in: W. Brauer (Ed.), Proceedings of the Fifth GI...
  • X. Tan, S. Chen, Z.H. Zhou, F. Zhang, Robust face recognition from a single training image per person with kernel-based...
  • D.S. Satish, C.C. Sekhar, Kernel based clustering and vector quantization for speech recognition, in: Proceedings of...
  • A. Majid Awan, Mohd Noor Md. Sap, An intelligent system based on kernel methods for crop yield prediction, in: W.K. Ng,...
  • A.Y. Ng et al.

    On spectral clustering: analysis and an algorithm

  • A. Rahimi, B. Recht, Clustering with normalized cuts is clustering with a hyperplane, Statist. Learn. Comput. Vis....
  • J. Shi et al.

    Normalized cuts and image segmentation

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2000)
  • A.N. Srivastava, Mixture density Mercer kernels: a method to learn kernels directly from data, in: SDM,...
  • N. Cristianini et al.

    Spectral kernel methods for clustering

  • I.S. Dhillon

    Co-clustering documents and words using bipartite spectral graph partitioning

  • Y. Kluger et al.

    Spectral biclustering of microarray data: coclustering genes and conditions

    Genome Res.

    (2003)
  • B. Kulis et al.

    Semi-supervised graph clustering: a kernel approach

  • A. Paccanaro, C. Chennubhotla, J.A. Casbon, M.A.S. Saqi, Spectral clustering of protein sequences, in: International...
  • J. Weston et al.

    Semi-supervised protein classification using cluster kernels

    Bioinformatics

    (2005)
  • Y. Linde et al.

    An algorithm for vector quantizer design

    IEEE Trans. Commun.

    (1980)
  • S. Lloyd

    Least squares quantization in pcm

    IEEE Trans. Inf. Theory

    (1982)
  • Cited by (710)

    View all citing articles on Scopus

    About the Author—MAURIZIO FILIPPONE received a Laurea degree in Physics in 2004 from the University of Genova and now he is pursuing a Ph.D. in Computer Science at University of Genova. His research interests are focused on Machine Learning techniques and their applications to Bioinformatics and to time series analysis and forecasting.

    About the Author—FRANCESCO CAMASTRA received Laurea degree in Physics in 1985 from University of Milan and the Ph.D. degree in Computer Science from University of Genova in 2004. He was the recipient of Eduardo Caianiello Award 2005 and since February 2006 he has been Assistant Professor at University of Napoli Parthenope. His research interests are Machine Learning and Kernel Methods.

    About the Author—FRANCESCO MASULLI received the Laurea degree in Physics in 1976 from the University of Genova. He is currently an Associate Professor in Computer Science at the University of Genova. From 1983 to 2001 he has been an Assistant Professor at the University of Genova and from 2001 to 2005 an Associate Professor at University of Pisa. He authored or co-authored more than 120 scientific papers in Machine Learning, Neural Networks, Fuzzy Systems, Pattern Recognition and Bioinformatics.

    About the Author—STEFANO ROVETTA received a Laurea degree in Electronic Engineering in 1992 and a Ph.D. degree in Models, Methods and Tools for Electronic and Electromagnetic Systems in 1996, both from the University of Genova. He is currently an Assistant Professor in Computer Science at the University of Genova. His current research interests include Machine Learning and Clustering techniques for high-dimensional biomedical data analysis and document analysis.

    View full text