research-article

Focused clustering and outlier detection in large attributed graphs

Authors:
Bryan Perozzi

Stony Brook University, Stony Brook, NY, USA

Stony Brook University, Stony Brook, NY, USA
View Profile

,
Leman Akoglu

Stony Brook University, Stony Brook, NY, USA

Stony Brook University, Stony Brook, NY, USA
View Profile

,
Patricia Iglesias Sánchez

Karlsruhe Institute of Technology, Karlsruhe, Germany

Karlsruhe Institute of Technology, Karlsruhe, Germany
View Profile

,
Emmanuel Müller

Karlsruhe Institute of Technology & University of Antwerp, Karlsruhe, Germany

Karlsruhe Institute of Technology & University of Antwerp, Karlsruhe, Germany
View Profile

KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2014Pages 1346–1355https://doi.org/10.1145/2623330.2623682

Published:24 August 2014Publication History

KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 1346–1355

ABSTRACT

Graph clustering and graph outlier detection have been studied extensively on plain graphs, with various applications. Recently, algorithms have been extended to graphs with attributes as often observed in the real-world. However, all of these techniques fail to incorporate the user preference into graph mining, and thus, lack the ability to steer algorithms to more interesting parts of the attributed graph. In this work, we overcome this limitation and introduce a novel user-oriented approach for mining attributed graphs. The key aspect of our approach is to infer user preference by the so-called focus attributes through a set of user-provided exemplar nodes. In this new problem setting, clusters and outliers are then simultaneously mined according to this user preference. Specifically, our FocusCO algorithm identifies the focus, extracts focused clusters and detects outliers. Moreover, FocusCO scales well with graph size, since we perform a local clustering of interest to the user rather than global partitioning of the entire graph. We show the effectiveness and scalability of our method on synthetic and real-world graphs, as compared to both existing graph clustering and outlier detection approaches.

Supplemental Material

p1346-sidebyside.mp4

mp4

238.9 MB

Download

References

L. Akoglu, M. McGlohon, and C. Faloutsos. OddBall: Spotting anomalies in weighted graphs. In PAKDD, 2010. Google ScholarDigital Library
L. Akoglu, H. Tong, B. Meeder, and C. Faloutsos. PICS: Parameter-free identification of cohesive subgroups in large attributed graphs. In SIAM SDM, pages 439--450, 2012.Google ScholarCross Ref
R. Andersen, F. Chung, and K. Lang. Local graph partitioning using pagerank vectors. In FOCS, 2006. Google ScholarDigital Library
A. Banerjee, S. Basu, and S. Merugu. Multi-way clustering on relation graphs. In SIAM SDM, 2007.Google ScholarCross Ref
S. Basu, I. Davidson, and K. Wagstaff. Clustering with Constraints: Algorithms, Applications and Theory. 2008.Google Scholar
S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, NY, USA, 2004. Google ScholarDigital Library
D. Chakrabarti, S. Papadimitriou, D. S. Modha, and C. Faloutsos. Fully automatic cross-associations. In KDD, pages 79--88, 2004. Google ScholarDigital Library
A. Clauset. Finding local community structure in networks. Physical Review E, 72(2):026132, 2005.Google ScholarCross Ref
A. Condon and R. M. Karp. Algorithms for graph partitioning on the planted partition model. Random Struct. Algorithms, 18(2):116--140, 2001. Google ScholarDigital Library
I. Dhillon, S. Mallela, and D. Modha. Information-theoretic co-clustering. In KDD, 2003. Google ScholarDigital Library
C. H. Q. Ding, X. He, H. Zha, M. Gu, and H. D. Simon. A min-max cut algorithm for graph partitioning and data clustering. In ICDM, 2001. Google ScholarDigital Library
G. W. Flake, S. Lawrence, and C. L. Giles. Efficient identification of web communities. In KDD. 2000. Google ScholarDigital Library
J. Gao, H. Cheng, and P.-N. Tan. Semi-supervised outlier detection. In ACM Symp. on Appl. Comp., 2006. Google ScholarDigital Library
J. Gao, F. Liang, W. Fan, C. Wang, Y. Sun, and J. Han. On community outliers and their efficient detection in information networks. In KDD, pages 813--822, 2010. Google ScholarDigital Library
D. F. Gleich and C. Seshadhri. Vertex neighborhoods, low conductance cuts, and good seeds for local community methods. In KDD, pages 597--605, 2012. Google ScholarDigital Library
S. Günnemann, I. Färber, B. Boden, and T. Seidl. Subspace clustering meets dense subgraph mining: A synthesis of two paradigms. In ICDM, 2010.Google ScholarDigital Library
D. Hanisch, A. Zien, R. Zimmer, and T. Lengauer. Co-clustering of biological networks and gene expression data. In ISMB, pages 145--154, 2002.Google ScholarCross Ref
P. Iglesias, E. Müller, F. Laforet, F. Keller, and K. Böhm. Statistical selection of congruent subspaces for outlier detection on attributed graphs. In ICDM, 2013.Google Scholar
G. Karypis and V. Kumar. Multilevel algorithms for multi-constraint graph partitioning. In Proc. of Supercomputing, pages 1--13, 1998. Google ScholarDigital Library
B. Long, Z. Zhang, X. Wu, and P. S. Yu. Spectral clustering for multi-type relational data. In ICML, 2006. Google ScholarDigital Library
C. D. Manning, P. Raghavan, and H. Schtze. Intro to Information Retrieval. Cambridge University Press, 2008. Google ScholarDigital Library
F. Moser, R. Colak, A. Rafiey, and M. Ester. Mining cohesive patterns from graphs with feature vectors. In SDM, 2009.Google ScholarCross Ref
E. Muller, P. I. Sanchez, Y. Mülle, and K. Böhm. Ranking outlier nodes in subspaces of attributed graphs. In ICDE Workshops, pages 216--222, 2013.Google ScholarCross Ref
A. Y. Ng, M. I. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In NIPS, 2001.Google ScholarDigital Library
C. C. Noble and D. J. Cook. Graph-based anomaly detection. In KDD, pages 631--636, 2003. Google ScholarDigital Library
J. Tang and H. Liu. Unsupervised feature selection for linked social media data. In KDD, pages 904--912, 2012. Google ScholarDigital Library
Y. Tian, R. A. Hankins, and J. M. Patel. Efficient aggregation for graph summarization. In SIGMOD, 2008. Google ScholarDigital Library
H. Tong and C.-Y. Lin. Non-negative residual matrix factorization with application to graph anomaly detection. In SIAM SDM, pages 143--153, 2011.Google ScholarCross Ref
F. Wang and J. Sun. Distance metric learning in data mining. In SIAM SDM. Tutorials, 2012.Google Scholar
J. Whang, D. Gleich, and I. Dhillon. Overlapping community detection using seed set expansion. In CIKM, pages 2099--2108, 2013. Google ScholarDigital Library
S. White and P. Smyth. A spectral clustering approach to finding communities in graph. In SDM, 2005.Google ScholarCross Ref
E. P. Xing, A. Y. Ng, M. I. Jordan, and S. J. Russell. Distance metric learning with application to clustering with side-information. In NIPS, pages 505--512, 2002.Google ScholarDigital Library
J. Yang and J. Leskovec. Overlapping community detection at scale: a nonnegative matrix factorization approach. In WSDM, pages 587--596, 2013. Google ScholarDigital Library
Y. Zhou, H. Cheng, and J. X. Yu. Graph clustering based on structural/attribute similarities. PVLDB, 2(1):718--729, 2009. Google ScholarDigital Library

Index Terms

Focused clustering and outlier detection in large attributed graphs
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
        Cluster analysis
2. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Clustering Large Attributed Graphs: A Balance between Structural and Attribute Similarities

Social networks, sensor networks, biological networks, and many other information networks can be modeled as a large graph. Graph vertices represent entities, and graph edges represent their relationships or interactions. In many large graphs, there is ...
Read More
Extending bootstrap AMG for clustering of attributed graphs
Highlights
- A new generalized spectral clustering method to detect community in attributed graphs.
- A linear complexity (scalable) algorithm, based on a bootstrap Algebraic MultiGrid method, to estimate near-kernel components of graph Laplacian ...
Abstract
In this paper we propose a new approach to detect clusters in undirected graphs with attributed vertices. We incorporate structural and attribute similarities between the vertices in an augmented graph by creating additional vertices and edges as ...
Read More
Spectral Clustering of Attributed Multi-relational Graphs
KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining

Graph clustering aims at discovering a natural grouping of the nodes such that similar nodes are assigned to a common cluster. Many different algorithms have been proposed in the literature: for simple graphs, for graphs with attributes associated to ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2014
2028 pages
ISBN:9781450329569
DOI:10.1145/2623330
General Chairs:
Sofus Macskassy
Facebook
,
Claudia Perlich
Dstillery
,
Program Chairs:
Jure Leskovec
Stanford University
,
Wei Wang
UCLA
,
Rayid Ghani
University of Chicago
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 August 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
attributed graphs
clustering
distance metric learning
focused graph mining
infer user preference
outlier mining
Qualifiers
- research-article
Conference

Acceptance Rates
KDD '14 Paper Acceptance Rate151of1,036submissions,15%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 149
  Total Citations
  View Citations
- 1,536
  Total Downloads
- Downloads (Last 12 months)83
- Downloads (Last 6 weeks)11
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Focused clustering and outlier detection in large attributed graphs

KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Clustering Large Attributed Graphs: A Balance between Structural and Attribute Similarities

Extending bootstrap AMG for clustering of attributed graphs

Spectral Clustering of Attributed Multi-relational Graphs