Abstract
The goal of graph clustering is to partition vertices in a large graph into different clusters based on various criteria such as vertex connectivity or neighborhood similarity. Graph clustering techniques are very useful for detecting densely connected groups in a large graph. Many existing graph clustering methods mainly focus on the topological structure for clustering, but largely ignore the vertex properties which are often heterogenous. In this paper, we propose a novel graph clustering algorithm, SA-Cluster, based on both structural and attribute similarities through a unified distance measure. Our method partitions a large graph associated with attributes into k clusters so that each cluster contains a densely connected subgraph with homogeneous attribute values. An effective method is proposed to automatically learn the degree of contributions of structural similarity and attribute similarity. Theoretical analysis is provided to show that SA-Cluster is converging. Extensive experimental results demonstrate the effectiveness of SA-Cluster through comparison with the state-of-the-art graph clustering and summarization methods.
- R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. In Proc. 1998 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD '98), pages 94--105, Seattle, WA, June 1998. Google ScholarDigital Library
- T. M. Apostol. Calculus, Vol. 1: One-Variable Calculus, with an Introduction to Linear Algebra, 2nd edition. Wiley, 1967.Google Scholar
- L. Botton and Y. Bengio. Convergence properties of the k-means algorithms. In Advances in Neural Information Processing Systems 7 (NIPS'94), pages 585--592, Denver, CO, Dec. 1994.Google Scholar
- D. Cai, Z. Shao, X. He, X. Yan, and J. Han. Mining hidden community in heterogeneous social networks. In Proc. Workshop on Link Discovery: Issues, Approaches and Applications (LinkKDD'05), pages 58--65, Chicago, IL, Aug. 2005. Google ScholarDigital Library
- R. Descartes. The Geometry of René Descartes. Dover Publications, 1954.Google Scholar
- M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases. In Proc. 1996 Int. Conf. Knowledge Discovery and Data Mining (KDD'96), pages 226--231, Portland, OR, Aug. 1996.Google Scholar
- C. Faloutsos, K. McCurley, and A. Tomkins. Fast discovery of connection subgraphs. In Proc. 2004 ACM SIGKDD Int. Conf. Knowledge Discovery in Databases (KDD'04), pages 118--127, Seattle, WA, Aug. 2004. Google ScholarDigital Library
- B. Fine and G. Rosenber. The Fundamental Theorem of Algebra (Undergraduate Texts in Mathematics). Springer-Verlag, 1997.Google Scholar
- D. Gibson, J. Kleinberg, and P. Raghavan. Inferring web communities from link topology. In Proc. 9th ACM Conf. Hypertext and Hypermedia, pages 225--234, Pittsburgh, PA, June 1998. Google ScholarDigital Library
- A. Hinneburg and D. A. Keim. An efficient approach to clustering in large multimedia databases with noise. In Proc. 1998 Int. Conf. Knowledge Discovery and Data Mining (KDD'98), pages 58--65, New York, NY, Aug. 1998.Google Scholar
- T. Hofmann. Probabilistic latent semantic indexing. In Proc. 1999 Int. ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR'99), pages 50--57, Berkeley, CA, Aug. 1998. Google ScholarDigital Library
- G. Jeh and J. Widom. SimRank: a measure of structural-context similarity. In Proc. 2002 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'02), pages 538--543, Edmonton, Canada, July 2002. Google ScholarDigital Library
- L. Kaufman and P. J. Rousseeuw. Clustering by means of medoids. Statistical Data Analysis based on the L1 Norm, pages 405--416, 1987.Google Scholar
- Z. Liu, J. X. Yu, Y. Ke, X. Lin, and L. Chen. Spotting significant changing subgraphs in evolving graphs. In Proc. 2008 Int. Conf. Data Mining (ICDM'08), Pisa, Italy, Dec. 2008. Google ScholarDigital Library
- S. Navlakha, R. Rastogi, and N. Shrivastava. Graph summarization with bounded error. In Proc. 2008 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'08), pages 419--432, Vancouver, Canada, June 2008. Google ScholarDigital Library
- M. E. J. Newman and M. Girvan. Finding and evaluating community structure in networks. In Phys. Rev. E 69, 026113, 2004.Google ScholarCross Ref
- R. Ng and J. Han. Efficient and effective clustering method for spatial data mining. In Proc. 1994 Int. Conf. Very Large Data Bases (VLDB'94), pages 144--155, Santiago, Chile, Sept. 1994. Google ScholarDigital Library
- P. Pons and M. Latapy. Computing communities in large networks using random walks. Journal of Graph Algorithms and Applications, 10(2):191--218, 2006.Google ScholarCross Ref
- J. Shi and J. Malik. Normalized cuts and image segmentation. In IEEE Trans. Pattern Analysis and Machine Intelligence, volume 22, no. 8, pages 888--905, 2000. Google ScholarDigital Library
- J. Sun, C. Faloutsos, S. Papadimitriou, and P. S. Yu. Graphscope: parameter-free mining of large time-evolving graphs. In Proc. 2007 ACM SIGKDD Int. Conf. Knowledge Discovery in Databases (KDD'07), pages 687--696, San Jose, CA, 2007. Google ScholarDigital Library
- Y. Sun, J. Han, P. Zhao, Z. Yin, H. Cheng, and T. Wu. Rankclus: Integrating clustering with ranking for heterogenous information network analysis. In Proc. 2009 Int. Conf. Extending Database Technology (EDBT'09), pages 565--576, Saint Petersburg, Russia, Mar. 2009. Google ScholarDigital Library
- Y. Tian, R. A. Hankins, and J. M. Patel. Efficient aggregation for graph summarization. In Proc. 2008 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'08), pages 567--580, Vancouver, Canada, June 2008. Google ScholarDigital Library
- H. Tong and C. Faloutsos. Center-piece subgraphs: problem definition and fast solutions. In Proc. 2006 ACM SIGKDD Int. Conf. Knowledge Discovery in Databases (KDD'06), pages 404--413, Philadelphia, PA, 2006. Google ScholarDigital Library
- H. Tong, C. Faloutsos, and J.-Y. Pan. Fast random walk with restart and its applications. In Proc. 2006 Int. Conf. on Data Mining (ICDM'06), pages 613--622, Hong Kong, Dec. 2006. Google ScholarDigital Library
- C.-Y. Tsai and C.-C. Chui. Developing a feature weight self-adjustment mechanism for a k-means clustering algorithm. Computational Statistics and Data Analysis, 52:4658--4672, 2008. Google ScholarDigital Library
- X. Xu, N. Yuruk, Z. Feng, and T. A. J. Schweiger. Scan: a structural clustering algorithm for networks. In Proc. 2007 Int. Conf. Knowledge Discovery and Data Mining (KDD'07), pages 824--833, San Jose, CA, Aug. 2007. Google ScholarDigital Library
- C. Zhai, A. Velivelli, and B. Yu. A cross-collection mixture model for comparative text mining. In Proc. 2004 ACM SIGKDD Int. Conf. Knowledge Discovery in Databases (KDD'04), pages 743--748, Seattle, WA, Aug. 2004. Google ScholarDigital Library
Index Terms
- Graph clustering based on structural/attribute similarities
Recommendations
Online structural graph clustering using frequent subgraph mining
ECMLPKDD'10: Proceedings of the 2010th European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part IIIThe goal of graph clustering is to partition objects in a graph database into different clusters based on various criteria such as vertex connectivity, neighborhood similarity or the size of the maximum common subgraph. This can serve to structure the ...
Graph clustering using k-Neighbourhood Attribute Structural similarity
Display Omitted Novel graph clustering algorithms (kNAS) is proposed, for overlapping community detection in large graph by combining the topological and attribute similarity that partitions the large graph into m clusters having high intracluster and ...
Efficient structural graph clustering: an index-based approach
Graph clustering is a fundamental problem widely experienced across many industries. The structural graph clustering (SCAN) method obtains not only clusters but also hubs and outliers. However, the clustering results closely depend on two sensitive ...
Comments