Skip to main content
Top
Published in: Advances in Data Analysis and Classification 1/2013

01-03-2013 | Regular Article

Random walk distances in data clustering and applications

Authors: Sijia Liu, Anastasios Matzavinos, Sunder Sethuraman

Published in: Advances in Data Analysis and Classification | Issue 1/2013

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this paper, we develop a family of data clustering algorithms that combine the strengths of existing spectral approaches to clustering with various desirable properties of fuzzy methods. In particular, we show that the developed method “Fuzzy-RW,” outperforms other frequently used algorithms in data sets with different geometries. As applications, we discuss data clustering of biological and face recognition benchmarks such as the IRIS and YALE face data sets.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Footnotes
1
For further discussion of the emerging role of data geometry in the development of data clustering algorithms (see, e.g., Chen and Lerman 2009; Haralick and Harpaz 2007; Coifman and Lafon 2006).
 
2
As it is usually the case in data clustering applications, the employed distance measures do not have to necessarily satisfy the properties of a metric (see, e.g., Chen and Zhang 2004).
 
3
Data clustering on linear manifolds, or affine spaces was first introduced by Bock (1974). Adopting the terminology of Haralick and Harpaz (2007), we say that \(L\) is a linear manifold in a vector space \(V\) if for some vector subspace \(S\) of \(V\) and some translation \(t\in V,\,L=\{t+s \mid s\in S\}.\)
 
Literature
go back to reference Abonyi J, Feil B (2007) Cluster analysis for data mining and system identification. Birkhäuser, BaselMATH Abonyi J, Feil B (2007) Cluster analysis for data mining and system identification. Birkhäuser, BaselMATH
go back to reference Arias-Castro E, Chen G, Lerman G (2010) Spectral clustering based on local linear approximations. arXiv:1001.1323v1 Arias-Castro E, Chen G, Lerman G (2010) Spectral clustering based on local linear approximations. arXiv:1001.1323v1
go back to reference Belhumeur P, Hespanha J, Kriegman D (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19(7):711–720CrossRef Belhumeur P, Hespanha J, Kriegman D (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19(7):711–720CrossRef
go back to reference Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 16:1373–1396CrossRef Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 16:1373–1396CrossRef
go back to reference Ben-Hur A, Elisseeff A, Guyon I (2002) A stability based method for discovering structure in clustered data. In: Pacific Symposium on Biocomputing, pp 6–17 Ben-Hur A, Elisseeff A, Guyon I (2002) A stability based method for discovering structure in clustered data. In: Pacific Symposium on Biocomputing, pp 6–17
go back to reference Bezdek J, Ehrlich R, Full W (1984) FCM: the fuzzy c-means clustering algorithm. Comput Geosci 10: 191–203 Bezdek J, Ehrlich R, Full W (1984) FCM: the fuzzy c-means clustering algorithm. Comput Geosci 10: 191–203
go back to reference Bezdek J, Hall L, Clark M, Goldgof D, Clarke L (1997) Medical image analysis with fuzzy models. Stat Methods Med Res 6:191–214CrossRef Bezdek J, Hall L, Clark M, Goldgof D, Clarke L (1997) Medical image analysis with fuzzy models. Stat Methods Med Res 6:191–214CrossRef
go back to reference Bock H-H (1974) Automatische Klassifikation. Theoretische und praktische Methoden zur Gruppierung und Strukturierung von Daten (Clusteranalyse). Vandenhoek & Ruprecht, Göttingen (in German) Bock H-H (1974) Automatische Klassifikation. Theoretische und praktische Methoden zur Gruppierung und Strukturierung von Daten (Clusteranalyse). Vandenhoek & Ruprecht, Göttingen (in German)
go back to reference Bock H-H (1987) On the interface between cluster analysis, principal component clustering, and multidimensional scaling. In: Bozdogan H, Gupta A (eds) Multivariate statistical modeling and data analysis. Reidel, Dordrecht, pp 17–34CrossRef Bock H-H (1987) On the interface between cluster analysis, principal component clustering, and multidimensional scaling. In: Bozdogan H, Gupta A (eds) Multivariate statistical modeling and data analysis. Reidel, Dordrecht, pp 17–34CrossRef
go back to reference Brémaud P (1999) Markov chains: Gibbs fields, Monte Carlo simulation, and queues. Springer, New YorkMATH Brémaud P (1999) Markov chains: Gibbs fields, Monte Carlo simulation, and queues. Springer, New YorkMATH
go back to reference Brunelli R, Poggio T (1993) Face recognition: features vs. templates. IEEE Trans Pattern Anal Mach Intell 15(10):1042–1053CrossRef Brunelli R, Poggio T (1993) Face recognition: features vs. templates. IEEE Trans Pattern Anal Mach Intell 15(10):1042–1053CrossRef
go back to reference Cao F, Delon J, Desolneux A, Museé P, Sur F (2007) A unified framework for detecting groups and application to shape recognition. J Math Imaging Vis 27(2):91–119MathSciNetMATHCrossRef Cao F, Delon J, Desolneux A, Museé P, Sur F (2007) A unified framework for detecting groups and application to shape recognition. J Math Imaging Vis 27(2):91–119MathSciNetMATHCrossRef
go back to reference Cao F, Lisani J-L, Morel J-M, Museé P, Sur F (2008) A theory of shape identification. Springer, BerlinMATHCrossRef Cao F, Lisani J-L, Morel J-M, Museé P, Sur F (2008) A theory of shape identification. Springer, BerlinMATHCrossRef
go back to reference Chen G, Lerman G (2009) Spectral curvature clustering (SCC). Int J Comput Vis 81(3):317–330CrossRef Chen G, Lerman G (2009) Spectral curvature clustering (SCC). Int J Comput Vis 81(3):317–330CrossRef
go back to reference Chen S, Zhang D (2004) Robust image segmentation using FCM with spatial constraints based on new kernel-induced distance measure. IEEE Trans Syst Man Cybern Part B 34(4):1907–1916CrossRef Chen S, Zhang D (2004) Robust image segmentation using FCM with spatial constraints based on new kernel-induced distance measure. IEEE Trans Syst Man Cybern Part B 34(4):1907–1916CrossRef
go back to reference Chung F (1997) Spectral graph theory. CBMS, vol 92. American Mathematical Society, Providence Chung F (1997) Spectral graph theory. CBMS, vol 92. American Mathematical Society, Providence
go back to reference Cominetti O, Matzavinos A, Samarasinghe S, Kulasiri D, Liu S, Maini P, Erban R (2010) Diffuzzy: a fuzzy clustering algorithm for complex data sets. Int J Comput Intell Bioinforma Syst Biol 1(4):402–417 Cominetti O, Matzavinos A, Samarasinghe S, Kulasiri D, Liu S, Maini P, Erban R (2010) Diffuzzy: a fuzzy clustering algorithm for complex data sets. Int J Comput Intell Bioinforma Syst Biol 1(4):402–417
go back to reference Desolneux A, Moisan L, Morel J-M (2008) From gestalt theory to image analysis: a probabilistic approach. Springer, New YorkCrossRef Desolneux A, Moisan L, Morel J-M (2008) From gestalt theory to image analysis: a probabilistic approach. Springer, New YorkCrossRef
go back to reference Franke M, Geyer-Schulz A (2009) An update algorithm for restricted random walk clustering for dynamic data sets. Adv Data Anal Classif 3(1):63–92MathSciNetMATHCrossRef Franke M, Geyer-Schulz A (2009) An update algorithm for restricted random walk clustering for dynamic data sets. Adv Data Anal Classif 3(1):63–92MathSciNetMATHCrossRef
go back to reference Gan G, Ma C, Wu J (2007) Data clustering: theory, algorithms, and applications. In: ASA-SIAM series on statistics and applied probability. SIAM, Philadelphia Gan G, Ma C, Wu J (2007) Data clustering: theory, algorithms, and applications. In: ASA-SIAM series on statistics and applied probability. SIAM, Philadelphia
go back to reference Georghiades A, Belhumeur P, Kriegman D (2001) From few to many: Illumination cone models for face recognition under variable lighting and pose. IEEE Trans Pattern Anal Mach Intell 23(6):643–660CrossRef Georghiades A, Belhumeur P, Kriegman D (2001) From few to many: Illumination cone models for face recognition under variable lighting and pose. IEEE Trans Pattern Anal Mach Intell 23(6):643–660CrossRef
go back to reference Haralick R, Harpaz R (2005) Linear manifold clustering. In: Perner P, Imiya A (eds) Machine learning and data mining in pattern recognition. Lecture notes in computer science, vol 3587. Springer, Berlin, pp 132–141 Haralick R, Harpaz R (2005) Linear manifold clustering. In: Perner P, Imiya A (eds) Machine learning and data mining in pattern recognition. Lecture notes in computer science, vol 3587. Springer, Berlin, pp 132–141
go back to reference Haralick R, Harpaz R (2007) Linear manifold clustering in high dimensional spaces by stochastic search. Pattern Recognit 40(10):2672–2684MATHCrossRef Haralick R, Harpaz R (2007) Linear manifold clustering in high dimensional spaces by stochastic search. Pattern Recognit 40(10):2672–2684MATHCrossRef
go back to reference Hathaway R, Bezdek J (2001) Fuzzy \(c\)-means clustering of incomplete data. IEEE Trans Syst Man Cybern Part B 31(5):735–744CrossRef Hathaway R, Bezdek J (2001) Fuzzy \(c\)-means clustering of incomplete data. IEEE Trans Syst Man Cybern Part B 31(5):735–744CrossRef
go back to reference He X, Yan S, Hu Y, Niyogi P, Zhang H-J (2005) Face recognition using laplacianfaces. IEEE Trans Pattern Anal Mach Intell 27(3):328–340CrossRef He X, Yan S, Hu Y, Niyogi P, Zhang H-J (2005) Face recognition using laplacianfaces. IEEE Trans Pattern Anal Mach Intell 27(3):328–340CrossRef
go back to reference Jain A (2010) Data clustering: 50 years beyond K-means. Pattern Recognit Lett 31(8):651–666CrossRef Jain A (2010) Data clustering: 50 years beyond K-means. Pattern Recognit Lett 31(8):651–666CrossRef
go back to reference Kogan J (2007) Introduction to clustering large and high-dimensional data. Cambridge University Press, New YorkMATH Kogan J (2007) Introduction to clustering large and high-dimensional data. Cambridge University Press, New YorkMATH
go back to reference Lee K-C, Ho JM, Kriegman D (2005) Acquiring linear subspaces for face recognition under variable lighting. IEEE Trans Pattern Anal Mach Intell 27(5):684–698 Lee K-C, Ho JM, Kriegman D (2005) Acquiring linear subspaces for face recognition under variable lighting. IEEE Trans Pattern Anal Mach Intell 27(5):684–698
go back to reference Levine E, Domany E (2001) Resampling method for unsupervised estimation of cluster validity. Neural Comput 13(11):2573–2593MATHCrossRef Levine E, Domany E (2001) Resampling method for unsupervised estimation of cluster validity. Neural Comput 13(11):2573–2593MATHCrossRef
go back to reference Liao C-S, Lu K, Baym M, Singh R, Berger B (2009) IsoRankN: spectral methods for global alignment of multiple protein networks. Bioinformatics 25(12):i253–i258CrossRef Liao C-S, Lu K, Baym M, Singh R, Berger B (2009) IsoRankN: spectral methods for global alignment of multiple protein networks. Bioinformatics 25(12):i253–i258CrossRef
go back to reference Macqueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley symposium on mathematical statistics and probability. University of California Press, pp 281–297 Macqueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley symposium on mathematical statistics and probability. University of California Press, pp 281–297
go back to reference Meila M (2006) The uniqueness of a good optimum for \(k\)-means. In: Cohen W, Moore A (eds) Proceedings of the 23rd international conference on machine Learning, pp 625–632 Meila M (2006) The uniqueness of a good optimum for \(k\)-means. In: Cohen W, Moore A (eds) Proceedings of the 23rd international conference on machine Learning, pp 625–632
go back to reference Miyamoto S, Ichihashi H, Honda K (2008) Algorithms for fuzzy clustering: methods in c-means clustering with applications. Studies in fuzziness and soft computing, vol 229. Springer, Berlin Miyamoto S, Ichihashi H, Honda K (2008) Algorithms for fuzzy clustering: methods in c-means clustering with applications. Studies in fuzziness and soft computing, vol 229. Springer, Berlin
go back to reference Ng A, Jordan M, Weiss Y (2002) On spectral clustering: analysis and an algorithm. In: Leen T, Dietterich T, Tresp V (eds) Advances in neural information processing systems, vol 14. MIT Press, Cambridge, pp 849–856 Ng A, Jordan M, Weiss Y (2002) On spectral clustering: analysis and an algorithm. In: Leen T, Dietterich T, Tresp V (eds) Advances in neural information processing systems, vol 14. MIT Press, Cambridge, pp 849–856
go back to reference Shental N, Bar-Hillel A, Hertz T, Weinshall D (2009) Gaussian mixture models with equivalence constraints. In: Basu S, Davidson I, Wagstaff K (eds) Constrained Clustering: advances in algorithms, theory, and applications. Chapman & Hall, London, pp 33–58 Shental N, Bar-Hillel A, Hertz T, Weinshall D (2009) Gaussian mixture models with equivalence constraints. In: Basu S, Davidson I, Wagstaff K (eds) Constrained Clustering: advances in algorithms, theory, and applications. Chapman & Hall, London, pp 33–58
go back to reference Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Image Segm 22(8):888–905CrossRef Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Image Segm 22(8):888–905CrossRef
go back to reference Snel B, Bork P, Huynen M (2002) The identification of functional modules from the genomic association of genes. PNAS 99(9):5890–5895CrossRef Snel B, Bork P, Huynen M (2002) The identification of functional modules from the genomic association of genes. PNAS 99(9):5890–5895CrossRef
go back to reference Späth H (1985) Cluster dissection and analysis. Ellis Horwood Ltd., ChichesterMATH Späth H (1985) Cluster dissection and analysis. Ellis Horwood Ltd., ChichesterMATH
go back to reference Tsao J, Lauterbur P (1998) Generalized clustering-based image registration for multi-modality images. Proc 20th Ann Int Conf IEEE Eng Med Biol Soc 20(2):667–670 Tsao J, Lauterbur P (1998) Generalized clustering-based image registration for multi-modality images. Proc 20th Ann Int Conf IEEE Eng Med Biol Soc 20(2):667–670
go back to reference Tziakos I, Theoharatos C, Laskaris N, Economou G (2009) Color image segmentation using Laplacian eigenmaps. J Electron Imaging 18(2):023004CrossRef Tziakos I, Theoharatos C, Laskaris N, Economou G (2009) Color image segmentation using Laplacian eigenmaps. J Electron Imaging 18(2):023004CrossRef
go back to reference Yen D, Vanvyve F, Wouters F, Fouss F, Verleysen M, Saerens M (2005) Clustering using a random walk based distance measure. In: Verleysen M (ed) In: Proceedings of the 13th European symposium on artificial, neural networks (ESANN), pp 317–324 Yen D, Vanvyve F, Wouters F, Fouss F, Verleysen M, Saerens M (2005) Clustering using a random walk based distance measure. In: Verleysen M (ed) In: Proceedings of the 13th European symposium on artificial, neural networks (ESANN), pp 317–324
Metadata
Title
Random walk distances in data clustering and applications
Authors
Sijia Liu
Anastasios Matzavinos
Sunder Sethuraman
Publication date
01-03-2013
Publisher
Springer-Verlag
Published in
Advances in Data Analysis and Classification / Issue 1/2013
Print ISSN: 1862-5347
Electronic ISSN: 1862-5355
DOI
https://doi.org/10.1007/s11634-013-0125-7

Other articles of this Issue 1/2013

Advances in Data Analysis and Classification 1/2013 Go to the issue

Editorial

Editorial

Premium Partner