Skip to main content
Erschienen in: Advances in Data Analysis and Classification 1/2013

01.03.2013 | Regular Article

Random walk distances in data clustering and applications

verfasst von: Sijia Liu, Anastasios Matzavinos, Sunder Sethuraman

Erschienen in: Advances in Data Analysis and Classification | Ausgabe 1/2013

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper, we develop a family of data clustering algorithms that combine the strengths of existing spectral approaches to clustering with various desirable properties of fuzzy methods. In particular, we show that the developed method “Fuzzy-RW,” outperforms other frequently used algorithms in data sets with different geometries. As applications, we discuss data clustering of biological and face recognition benchmarks such as the IRIS and YALE face data sets.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
For further discussion of the emerging role of data geometry in the development of data clustering algorithms (see, e.g., Chen and Lerman 2009; Haralick and Harpaz 2007; Coifman and Lafon 2006).
 
2
As it is usually the case in data clustering applications, the employed distance measures do not have to necessarily satisfy the properties of a metric (see, e.g., Chen and Zhang 2004).
 
3
Data clustering on linear manifolds, or affine spaces was first introduced by Bock (1974). Adopting the terminology of Haralick and Harpaz (2007), we say that \(L\) is a linear manifold in a vector space \(V\) if for some vector subspace \(S\) of \(V\) and some translation \(t\in V,\,L=\{t+s \mid s\in S\}.\)
 
Literatur
Zurück zum Zitat Abonyi J, Feil B (2007) Cluster analysis for data mining and system identification. Birkhäuser, BaselMATH Abonyi J, Feil B (2007) Cluster analysis for data mining and system identification. Birkhäuser, BaselMATH
Zurück zum Zitat Arias-Castro E, Chen G, Lerman G (2010) Spectral clustering based on local linear approximations. arXiv:1001.1323v1 Arias-Castro E, Chen G, Lerman G (2010) Spectral clustering based on local linear approximations. arXiv:1001.1323v1
Zurück zum Zitat Belhumeur P, Hespanha J, Kriegman D (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19(7):711–720CrossRef Belhumeur P, Hespanha J, Kriegman D (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19(7):711–720CrossRef
Zurück zum Zitat Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 16:1373–1396CrossRef Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 16:1373–1396CrossRef
Zurück zum Zitat Ben-Hur A, Elisseeff A, Guyon I (2002) A stability based method for discovering structure in clustered data. In: Pacific Symposium on Biocomputing, pp 6–17 Ben-Hur A, Elisseeff A, Guyon I (2002) A stability based method for discovering structure in clustered data. In: Pacific Symposium on Biocomputing, pp 6–17
Zurück zum Zitat Bezdek J, Ehrlich R, Full W (1984) FCM: the fuzzy c-means clustering algorithm. Comput Geosci 10: 191–203 Bezdek J, Ehrlich R, Full W (1984) FCM: the fuzzy c-means clustering algorithm. Comput Geosci 10: 191–203
Zurück zum Zitat Bezdek J, Hall L, Clark M, Goldgof D, Clarke L (1997) Medical image analysis with fuzzy models. Stat Methods Med Res 6:191–214CrossRef Bezdek J, Hall L, Clark M, Goldgof D, Clarke L (1997) Medical image analysis with fuzzy models. Stat Methods Med Res 6:191–214CrossRef
Zurück zum Zitat Bock H-H (1974) Automatische Klassifikation. Theoretische und praktische Methoden zur Gruppierung und Strukturierung von Daten (Clusteranalyse). Vandenhoek & Ruprecht, Göttingen (in German) Bock H-H (1974) Automatische Klassifikation. Theoretische und praktische Methoden zur Gruppierung und Strukturierung von Daten (Clusteranalyse). Vandenhoek & Ruprecht, Göttingen (in German)
Zurück zum Zitat Bock H-H (1987) On the interface between cluster analysis, principal component clustering, and multidimensional scaling. In: Bozdogan H, Gupta A (eds) Multivariate statistical modeling and data analysis. Reidel, Dordrecht, pp 17–34CrossRef Bock H-H (1987) On the interface between cluster analysis, principal component clustering, and multidimensional scaling. In: Bozdogan H, Gupta A (eds) Multivariate statistical modeling and data analysis. Reidel, Dordrecht, pp 17–34CrossRef
Zurück zum Zitat Brémaud P (1999) Markov chains: Gibbs fields, Monte Carlo simulation, and queues. Springer, New YorkMATH Brémaud P (1999) Markov chains: Gibbs fields, Monte Carlo simulation, and queues. Springer, New YorkMATH
Zurück zum Zitat Brunelli R, Poggio T (1993) Face recognition: features vs. templates. IEEE Trans Pattern Anal Mach Intell 15(10):1042–1053CrossRef Brunelli R, Poggio T (1993) Face recognition: features vs. templates. IEEE Trans Pattern Anal Mach Intell 15(10):1042–1053CrossRef
Zurück zum Zitat Cao F, Delon J, Desolneux A, Museé P, Sur F (2007) A unified framework for detecting groups and application to shape recognition. J Math Imaging Vis 27(2):91–119MathSciNetMATHCrossRef Cao F, Delon J, Desolneux A, Museé P, Sur F (2007) A unified framework for detecting groups and application to shape recognition. J Math Imaging Vis 27(2):91–119MathSciNetMATHCrossRef
Zurück zum Zitat Cao F, Lisani J-L, Morel J-M, Museé P, Sur F (2008) A theory of shape identification. Springer, BerlinMATHCrossRef Cao F, Lisani J-L, Morel J-M, Museé P, Sur F (2008) A theory of shape identification. Springer, BerlinMATHCrossRef
Zurück zum Zitat Chen G, Lerman G (2009) Spectral curvature clustering (SCC). Int J Comput Vis 81(3):317–330CrossRef Chen G, Lerman G (2009) Spectral curvature clustering (SCC). Int J Comput Vis 81(3):317–330CrossRef
Zurück zum Zitat Chen S, Zhang D (2004) Robust image segmentation using FCM with spatial constraints based on new kernel-induced distance measure. IEEE Trans Syst Man Cybern Part B 34(4):1907–1916CrossRef Chen S, Zhang D (2004) Robust image segmentation using FCM with spatial constraints based on new kernel-induced distance measure. IEEE Trans Syst Man Cybern Part B 34(4):1907–1916CrossRef
Zurück zum Zitat Chung F (1997) Spectral graph theory. CBMS, vol 92. American Mathematical Society, Providence Chung F (1997) Spectral graph theory. CBMS, vol 92. American Mathematical Society, Providence
Zurück zum Zitat Cominetti O, Matzavinos A, Samarasinghe S, Kulasiri D, Liu S, Maini P, Erban R (2010) Diffuzzy: a fuzzy clustering algorithm for complex data sets. Int J Comput Intell Bioinforma Syst Biol 1(4):402–417 Cominetti O, Matzavinos A, Samarasinghe S, Kulasiri D, Liu S, Maini P, Erban R (2010) Diffuzzy: a fuzzy clustering algorithm for complex data sets. Int J Comput Intell Bioinforma Syst Biol 1(4):402–417
Zurück zum Zitat Desolneux A, Moisan L, Morel J-M (2008) From gestalt theory to image analysis: a probabilistic approach. Springer, New YorkCrossRef Desolneux A, Moisan L, Morel J-M (2008) From gestalt theory to image analysis: a probabilistic approach. Springer, New YorkCrossRef
Zurück zum Zitat Franke M, Geyer-Schulz A (2009) An update algorithm for restricted random walk clustering for dynamic data sets. Adv Data Anal Classif 3(1):63–92MathSciNetMATHCrossRef Franke M, Geyer-Schulz A (2009) An update algorithm for restricted random walk clustering for dynamic data sets. Adv Data Anal Classif 3(1):63–92MathSciNetMATHCrossRef
Zurück zum Zitat Fu L, Medico E (2007) FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data. BMC Bioinforma 8(3). doi:10.11861471-2105-8-3 Fu L, Medico E (2007) FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data. BMC Bioinforma 8(3). doi:10.​11861471-2105-8-3
Zurück zum Zitat Gan G, Ma C, Wu J (2007) Data clustering: theory, algorithms, and applications. In: ASA-SIAM series on statistics and applied probability. SIAM, Philadelphia Gan G, Ma C, Wu J (2007) Data clustering: theory, algorithms, and applications. In: ASA-SIAM series on statistics and applied probability. SIAM, Philadelphia
Zurück zum Zitat Georghiades A, Belhumeur P, Kriegman D (2001) From few to many: Illumination cone models for face recognition under variable lighting and pose. IEEE Trans Pattern Anal Mach Intell 23(6):643–660CrossRef Georghiades A, Belhumeur P, Kriegman D (2001) From few to many: Illumination cone models for face recognition under variable lighting and pose. IEEE Trans Pattern Anal Mach Intell 23(6):643–660CrossRef
Zurück zum Zitat Haralick R, Harpaz R (2005) Linear manifold clustering. In: Perner P, Imiya A (eds) Machine learning and data mining in pattern recognition. Lecture notes in computer science, vol 3587. Springer, Berlin, pp 132–141 Haralick R, Harpaz R (2005) Linear manifold clustering. In: Perner P, Imiya A (eds) Machine learning and data mining in pattern recognition. Lecture notes in computer science, vol 3587. Springer, Berlin, pp 132–141
Zurück zum Zitat Haralick R, Harpaz R (2007) Linear manifold clustering in high dimensional spaces by stochastic search. Pattern Recognit 40(10):2672–2684MATHCrossRef Haralick R, Harpaz R (2007) Linear manifold clustering in high dimensional spaces by stochastic search. Pattern Recognit 40(10):2672–2684MATHCrossRef
Zurück zum Zitat Hathaway R, Bezdek J (2001) Fuzzy \(c\)-means clustering of incomplete data. IEEE Trans Syst Man Cybern Part B 31(5):735–744CrossRef Hathaway R, Bezdek J (2001) Fuzzy \(c\)-means clustering of incomplete data. IEEE Trans Syst Man Cybern Part B 31(5):735–744CrossRef
Zurück zum Zitat He X, Yan S, Hu Y, Niyogi P, Zhang H-J (2005) Face recognition using laplacianfaces. IEEE Trans Pattern Anal Mach Intell 27(3):328–340CrossRef He X, Yan S, Hu Y, Niyogi P, Zhang H-J (2005) Face recognition using laplacianfaces. IEEE Trans Pattern Anal Mach Intell 27(3):328–340CrossRef
Zurück zum Zitat Jain A (2010) Data clustering: 50 years beyond K-means. Pattern Recognit Lett 31(8):651–666CrossRef Jain A (2010) Data clustering: 50 years beyond K-means. Pattern Recognit Lett 31(8):651–666CrossRef
Zurück zum Zitat Kogan J (2007) Introduction to clustering large and high-dimensional data. Cambridge University Press, New YorkMATH Kogan J (2007) Introduction to clustering large and high-dimensional data. Cambridge University Press, New YorkMATH
Zurück zum Zitat Lee K-C, Ho JM, Kriegman D (2005) Acquiring linear subspaces for face recognition under variable lighting. IEEE Trans Pattern Anal Mach Intell 27(5):684–698 Lee K-C, Ho JM, Kriegman D (2005) Acquiring linear subspaces for face recognition under variable lighting. IEEE Trans Pattern Anal Mach Intell 27(5):684–698
Zurück zum Zitat Levine E, Domany E (2001) Resampling method for unsupervised estimation of cluster validity. Neural Comput 13(11):2573–2593MATHCrossRef Levine E, Domany E (2001) Resampling method for unsupervised estimation of cluster validity. Neural Comput 13(11):2573–2593MATHCrossRef
Zurück zum Zitat Liao C-S, Lu K, Baym M, Singh R, Berger B (2009) IsoRankN: spectral methods for global alignment of multiple protein networks. Bioinformatics 25(12):i253–i258CrossRef Liao C-S, Lu K, Baym M, Singh R, Berger B (2009) IsoRankN: spectral methods for global alignment of multiple protein networks. Bioinformatics 25(12):i253–i258CrossRef
Zurück zum Zitat Macqueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley symposium on mathematical statistics and probability. University of California Press, pp 281–297 Macqueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley symposium on mathematical statistics and probability. University of California Press, pp 281–297
Zurück zum Zitat Meila M (2006) The uniqueness of a good optimum for \(k\)-means. In: Cohen W, Moore A (eds) Proceedings of the 23rd international conference on machine Learning, pp 625–632 Meila M (2006) The uniqueness of a good optimum for \(k\)-means. In: Cohen W, Moore A (eds) Proceedings of the 23rd international conference on machine Learning, pp 625–632
Zurück zum Zitat Miyamoto S, Ichihashi H, Honda K (2008) Algorithms for fuzzy clustering: methods in c-means clustering with applications. Studies in fuzziness and soft computing, vol 229. Springer, Berlin Miyamoto S, Ichihashi H, Honda K (2008) Algorithms for fuzzy clustering: methods in c-means clustering with applications. Studies in fuzziness and soft computing, vol 229. Springer, Berlin
Zurück zum Zitat Muller N, Magaia L, Herbst B (2004) Singular value decomposition, eigenfaces, and 3D reconstructions. SIAM Rev 46(3):518–545MathSciNetMATHCrossRef Muller N, Magaia L, Herbst B (2004) Singular value decomposition, eigenfaces, and 3D reconstructions. SIAM Rev 46(3):518–545MathSciNetMATHCrossRef
Zurück zum Zitat Ng A, Jordan M, Weiss Y (2002) On spectral clustering: analysis and an algorithm. In: Leen T, Dietterich T, Tresp V (eds) Advances in neural information processing systems, vol 14. MIT Press, Cambridge, pp 849–856 Ng A, Jordan M, Weiss Y (2002) On spectral clustering: analysis and an algorithm. In: Leen T, Dietterich T, Tresp V (eds) Advances in neural information processing systems, vol 14. MIT Press, Cambridge, pp 849–856
Zurück zum Zitat Shental N, Bar-Hillel A, Hertz T, Weinshall D (2009) Gaussian mixture models with equivalence constraints. In: Basu S, Davidson I, Wagstaff K (eds) Constrained Clustering: advances in algorithms, theory, and applications. Chapman & Hall, London, pp 33–58 Shental N, Bar-Hillel A, Hertz T, Weinshall D (2009) Gaussian mixture models with equivalence constraints. In: Basu S, Davidson I, Wagstaff K (eds) Constrained Clustering: advances in algorithms, theory, and applications. Chapman & Hall, London, pp 33–58
Zurück zum Zitat Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Image Segm 22(8):888–905CrossRef Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Image Segm 22(8):888–905CrossRef
Zurück zum Zitat Snel B, Bork P, Huynen M (2002) The identification of functional modules from the genomic association of genes. PNAS 99(9):5890–5895CrossRef Snel B, Bork P, Huynen M (2002) The identification of functional modules from the genomic association of genes. PNAS 99(9):5890–5895CrossRef
Zurück zum Zitat Späth H (1985) Cluster dissection and analysis. Ellis Horwood Ltd., ChichesterMATH Späth H (1985) Cluster dissection and analysis. Ellis Horwood Ltd., ChichesterMATH
Zurück zum Zitat Tsao J, Lauterbur P (1998) Generalized clustering-based image registration for multi-modality images. Proc 20th Ann Int Conf IEEE Eng Med Biol Soc 20(2):667–670 Tsao J, Lauterbur P (1998) Generalized clustering-based image registration for multi-modality images. Proc 20th Ann Int Conf IEEE Eng Med Biol Soc 20(2):667–670
Zurück zum Zitat Tziakos I, Theoharatos C, Laskaris N, Economou G (2009) Color image segmentation using Laplacian eigenmaps. J Electron Imaging 18(2):023004CrossRef Tziakos I, Theoharatos C, Laskaris N, Economou G (2009) Color image segmentation using Laplacian eigenmaps. J Electron Imaging 18(2):023004CrossRef
Zurück zum Zitat Yen D, Vanvyve F, Wouters F, Fouss F, Verleysen M, Saerens M (2005) Clustering using a random walk based distance measure. In: Verleysen M (ed) In: Proceedings of the 13th European symposium on artificial, neural networks (ESANN), pp 317–324 Yen D, Vanvyve F, Wouters F, Fouss F, Verleysen M, Saerens M (2005) Clustering using a random walk based distance measure. In: Verleysen M (ed) In: Proceedings of the 13th European symposium on artificial, neural networks (ESANN), pp 317–324
Metadaten
Titel
Random walk distances in data clustering and applications
verfasst von
Sijia Liu
Anastasios Matzavinos
Sunder Sethuraman
Publikationsdatum
01.03.2013
Verlag
Springer-Verlag
Erschienen in
Advances in Data Analysis and Classification / Ausgabe 1/2013
Print ISSN: 1862-5347
Elektronische ISSN: 1862-5355
DOI
https://doi.org/10.1007/s11634-013-0125-7

Weitere Artikel der Ausgabe 1/2013

Advances in Data Analysis and Classification 1/2013 Zur Ausgabe

Editorial

Editorial