ABSTRACT
Graph clustering and graph outlier detection have been studied extensively on plain graphs, with various applications. Recently, algorithms have been extended to graphs with attributes as often observed in the real-world. However, all of these techniques fail to incorporate the user preference into graph mining, and thus, lack the ability to steer algorithms to more interesting parts of the attributed graph. In this work, we overcome this limitation and introduce a novel user-oriented approach for mining attributed graphs. The key aspect of our approach is to infer user preference by the so-called focus attributes through a set of user-provided exemplar nodes. In this new problem setting, clusters and outliers are then simultaneously mined according to this user preference. Specifically, our FocusCO algorithm identifies the focus, extracts focused clusters and detects outliers. Moreover, FocusCO scales well with graph size, since we perform a local clustering of interest to the user rather than global partitioning of the entire graph. We show the effectiveness and scalability of our method on synthetic and real-world graphs, as compared to both existing graph clustering and outlier detection approaches.
Supplemental Material
- L. Akoglu, M. McGlohon, and C. Faloutsos. OddBall: Spotting anomalies in weighted graphs. In PAKDD, 2010. Google ScholarDigital Library
- L. Akoglu, H. Tong, B. Meeder, and C. Faloutsos. PICS: Parameter-free identification of cohesive subgroups in large attributed graphs. In SIAM SDM, pages 439--450, 2012.Google ScholarCross Ref
- R. Andersen, F. Chung, and K. Lang. Local graph partitioning using pagerank vectors. In FOCS, 2006. Google ScholarDigital Library
- A. Banerjee, S. Basu, and S. Merugu. Multi-way clustering on relation graphs. In SIAM SDM, 2007.Google ScholarCross Ref
- S. Basu, I. Davidson, and K. Wagstaff. Clustering with Constraints: Algorithms, Applications and Theory. 2008.Google Scholar
- S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, NY, USA, 2004. Google ScholarDigital Library
- D. Chakrabarti, S. Papadimitriou, D. S. Modha, and C. Faloutsos. Fully automatic cross-associations. In KDD, pages 79--88, 2004. Google ScholarDigital Library
- A. Clauset. Finding local community structure in networks. Physical Review E, 72(2):026132, 2005.Google ScholarCross Ref
- A. Condon and R. M. Karp. Algorithms for graph partitioning on the planted partition model. Random Struct. Algorithms, 18(2):116--140, 2001. Google ScholarDigital Library
- I. Dhillon, S. Mallela, and D. Modha. Information-theoretic co-clustering. In KDD, 2003. Google ScholarDigital Library
- C. H. Q. Ding, X. He, H. Zha, M. Gu, and H. D. Simon. A min-max cut algorithm for graph partitioning and data clustering. In ICDM, 2001. Google ScholarDigital Library
- G. W. Flake, S. Lawrence, and C. L. Giles. Efficient identification of web communities. In KDD. 2000. Google ScholarDigital Library
- J. Gao, H. Cheng, and P.-N. Tan. Semi-supervised outlier detection. In ACM Symp. on Appl. Comp., 2006. Google ScholarDigital Library
- J. Gao, F. Liang, W. Fan, C. Wang, Y. Sun, and J. Han. On community outliers and their efficient detection in information networks. In KDD, pages 813--822, 2010. Google ScholarDigital Library
- D. F. Gleich and C. Seshadhri. Vertex neighborhoods, low conductance cuts, and good seeds for local community methods. In KDD, pages 597--605, 2012. Google ScholarDigital Library
- S. Günnemann, I. Färber, B. Boden, and T. Seidl. Subspace clustering meets dense subgraph mining: A synthesis of two paradigms. In ICDM, 2010.Google ScholarDigital Library
- D. Hanisch, A. Zien, R. Zimmer, and T. Lengauer. Co-clustering of biological networks and gene expression data. In ISMB, pages 145--154, 2002.Google ScholarCross Ref
- P. Iglesias, E. Müller, F. Laforet, F. Keller, and K. Böhm. Statistical selection of congruent subspaces for outlier detection on attributed graphs. In ICDM, 2013.Google Scholar
- G. Karypis and V. Kumar. Multilevel algorithms for multi-constraint graph partitioning. In Proc. of Supercomputing, pages 1--13, 1998. Google ScholarDigital Library
- B. Long, Z. Zhang, X. Wu, and P. S. Yu. Spectral clustering for multi-type relational data. In ICML, 2006. Google ScholarDigital Library
- C. D. Manning, P. Raghavan, and H. Schtze. Intro to Information Retrieval. Cambridge University Press, 2008. Google ScholarDigital Library
- F. Moser, R. Colak, A. Rafiey, and M. Ester. Mining cohesive patterns from graphs with feature vectors. In SDM, 2009.Google ScholarCross Ref
- E. Muller, P. I. Sanchez, Y. Mülle, and K. Böhm. Ranking outlier nodes in subspaces of attributed graphs. In ICDE Workshops, pages 216--222, 2013.Google ScholarCross Ref
- A. Y. Ng, M. I. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In NIPS, 2001.Google ScholarDigital Library
- C. C. Noble and D. J. Cook. Graph-based anomaly detection. In KDD, pages 631--636, 2003. Google ScholarDigital Library
- J. Tang and H. Liu. Unsupervised feature selection for linked social media data. In KDD, pages 904--912, 2012. Google ScholarDigital Library
- Y. Tian, R. A. Hankins, and J. M. Patel. Efficient aggregation for graph summarization. In SIGMOD, 2008. Google ScholarDigital Library
- H. Tong and C.-Y. Lin. Non-negative residual matrix factorization with application to graph anomaly detection. In SIAM SDM, pages 143--153, 2011.Google ScholarCross Ref
- F. Wang and J. Sun. Distance metric learning in data mining. In SIAM SDM. Tutorials, 2012.Google Scholar
- J. Whang, D. Gleich, and I. Dhillon. Overlapping community detection using seed set expansion. In CIKM, pages 2099--2108, 2013. Google ScholarDigital Library
- S. White and P. Smyth. A spectral clustering approach to finding communities in graph. In SDM, 2005.Google ScholarCross Ref
- E. P. Xing, A. Y. Ng, M. I. Jordan, and S. J. Russell. Distance metric learning with application to clustering with side-information. In NIPS, pages 505--512, 2002.Google ScholarDigital Library
- J. Yang and J. Leskovec. Overlapping community detection at scale: a nonnegative matrix factorization approach. In WSDM, pages 587--596, 2013. Google ScholarDigital Library
- Y. Zhou, H. Cheng, and J. X. Yu. Graph clustering based on structural/attribute similarities. PVLDB, 2(1):718--729, 2009. Google ScholarDigital Library
Index Terms
- Focused clustering and outlier detection in large attributed graphs
Recommendations
Clustering Large Attributed Graphs: A Balance between Structural and Attribute Similarities
Social networks, sensor networks, biological networks, and many other information networks can be modeled as a large graph. Graph vertices represent entities, and graph edges represent their relationships or interactions. In many large graphs, there is ...
Extending bootstrap AMG for clustering of attributed graphs
Highlights- A new generalized spectral clustering method to detect community in attributed graphs.
- A linear complexity (scalable) algorithm, based on a bootstrap Algebraic MultiGrid method, to estimate near-kernel components of graph Laplacian ...
AbstractIn this paper we propose a new approach to detect clusters in undirected graphs with attributed vertices. We incorporate structural and attribute similarities between the vertices in an augmented graph by creating additional vertices and edges as ...
Spectral Clustering of Attributed Multi-relational Graphs
KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data MiningGraph clustering aims at discovering a natural grouping of the nodes such that similar nodes are assigned to a common cluster. Many different algorithms have been proposed in the literature: for simple graphs, for graphs with attributes associated to ...
Comments