skip to main content
research-article

Graph clustering based on structural/attribute similarities

Published:01 August 2009Publication History
Skip Abstract Section

Abstract

The goal of graph clustering is to partition vertices in a large graph into different clusters based on various criteria such as vertex connectivity or neighborhood similarity. Graph clustering techniques are very useful for detecting densely connected groups in a large graph. Many existing graph clustering methods mainly focus on the topological structure for clustering, but largely ignore the vertex properties which are often heterogenous. In this paper, we propose a novel graph clustering algorithm, SA-Cluster, based on both structural and attribute similarities through a unified distance measure. Our method partitions a large graph associated with attributes into k clusters so that each cluster contains a densely connected subgraph with homogeneous attribute values. An effective method is proposed to automatically learn the degree of contributions of structural similarity and attribute similarity. Theoretical analysis is provided to show that SA-Cluster is converging. Extensive experimental results demonstrate the effectiveness of SA-Cluster through comparison with the state-of-the-art graph clustering and summarization methods.

References

  1. R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. In Proc. 1998 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD '98), pages 94--105, Seattle, WA, June 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. T. M. Apostol. Calculus, Vol. 1: One-Variable Calculus, with an Introduction to Linear Algebra, 2nd edition. Wiley, 1967.Google ScholarGoogle Scholar
  3. L. Botton and Y. Bengio. Convergence properties of the k-means algorithms. In Advances in Neural Information Processing Systems 7 (NIPS'94), pages 585--592, Denver, CO, Dec. 1994.Google ScholarGoogle Scholar
  4. D. Cai, Z. Shao, X. He, X. Yan, and J. Han. Mining hidden community in heterogeneous social networks. In Proc. Workshop on Link Discovery: Issues, Approaches and Applications (LinkKDD'05), pages 58--65, Chicago, IL, Aug. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. Descartes. The Geometry of René Descartes. Dover Publications, 1954.Google ScholarGoogle Scholar
  6. M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases. In Proc. 1996 Int. Conf. Knowledge Discovery and Data Mining (KDD'96), pages 226--231, Portland, OR, Aug. 1996.Google ScholarGoogle Scholar
  7. C. Faloutsos, K. McCurley, and A. Tomkins. Fast discovery of connection subgraphs. In Proc. 2004 ACM SIGKDD Int. Conf. Knowledge Discovery in Databases (KDD'04), pages 118--127, Seattle, WA, Aug. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. B. Fine and G. Rosenber. The Fundamental Theorem of Algebra (Undergraduate Texts in Mathematics). Springer-Verlag, 1997.Google ScholarGoogle Scholar
  9. D. Gibson, J. Kleinberg, and P. Raghavan. Inferring web communities from link topology. In Proc. 9th ACM Conf. Hypertext and Hypermedia, pages 225--234, Pittsburgh, PA, June 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Hinneburg and D. A. Keim. An efficient approach to clustering in large multimedia databases with noise. In Proc. 1998 Int. Conf. Knowledge Discovery and Data Mining (KDD'98), pages 58--65, New York, NY, Aug. 1998.Google ScholarGoogle Scholar
  11. T. Hofmann. Probabilistic latent semantic indexing. In Proc. 1999 Int. ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR'99), pages 50--57, Berkeley, CA, Aug. 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. G. Jeh and J. Widom. SimRank: a measure of structural-context similarity. In Proc. 2002 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'02), pages 538--543, Edmonton, Canada, July 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. L. Kaufman and P. J. Rousseeuw. Clustering by means of medoids. Statistical Data Analysis based on the L1 Norm, pages 405--416, 1987.Google ScholarGoogle Scholar
  14. Z. Liu, J. X. Yu, Y. Ke, X. Lin, and L. Chen. Spotting significant changing subgraphs in evolving graphs. In Proc. 2008 Int. Conf. Data Mining (ICDM'08), Pisa, Italy, Dec. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Navlakha, R. Rastogi, and N. Shrivastava. Graph summarization with bounded error. In Proc. 2008 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'08), pages 419--432, Vancouver, Canada, June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. E. J. Newman and M. Girvan. Finding and evaluating community structure in networks. In Phys. Rev. E 69, 026113, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  17. R. Ng and J. Han. Efficient and effective clustering method for spatial data mining. In Proc. 1994 Int. Conf. Very Large Data Bases (VLDB'94), pages 144--155, Santiago, Chile, Sept. 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. P. Pons and M. Latapy. Computing communities in large networks using random walks. Journal of Graph Algorithms and Applications, 10(2):191--218, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  19. J. Shi and J. Malik. Normalized cuts and image segmentation. In IEEE Trans. Pattern Analysis and Machine Intelligence, volume 22, no. 8, pages 888--905, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Sun, C. Faloutsos, S. Papadimitriou, and P. S. Yu. Graphscope: parameter-free mining of large time-evolving graphs. In Proc. 2007 ACM SIGKDD Int. Conf. Knowledge Discovery in Databases (KDD'07), pages 687--696, San Jose, CA, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Y. Sun, J. Han, P. Zhao, Z. Yin, H. Cheng, and T. Wu. Rankclus: Integrating clustering with ranking for heterogenous information network analysis. In Proc. 2009 Int. Conf. Extending Database Technology (EDBT'09), pages 565--576, Saint Petersburg, Russia, Mar. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Y. Tian, R. A. Hankins, and J. M. Patel. Efficient aggregation for graph summarization. In Proc. 2008 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'08), pages 567--580, Vancouver, Canada, June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. H. Tong and C. Faloutsos. Center-piece subgraphs: problem definition and fast solutions. In Proc. 2006 ACM SIGKDD Int. Conf. Knowledge Discovery in Databases (KDD'06), pages 404--413, Philadelphia, PA, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. H. Tong, C. Faloutsos, and J.-Y. Pan. Fast random walk with restart and its applications. In Proc. 2006 Int. Conf. on Data Mining (ICDM'06), pages 613--622, Hong Kong, Dec. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. C.-Y. Tsai and C.-C. Chui. Developing a feature weight self-adjustment mechanism for a k-means clustering algorithm. Computational Statistics and Data Analysis, 52:4658--4672, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. X. Xu, N. Yuruk, Z. Feng, and T. A. J. Schweiger. Scan: a structural clustering algorithm for networks. In Proc. 2007 Int. Conf. Knowledge Discovery and Data Mining (KDD'07), pages 824--833, San Jose, CA, Aug. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. C. Zhai, A. Velivelli, and B. Yu. A cross-collection mixture model for comparative text mining. In Proc. 2004 ACM SIGKDD Int. Conf. Knowledge Discovery in Databases (KDD'04), pages 743--748, Seattle, WA, Aug. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Graph clustering based on structural/attribute similarities

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader