skip to main content
10.1145/2623330.2623682acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Focused clustering and outlier detection in large attributed graphs

Published:24 August 2014Publication History

ABSTRACT

Graph clustering and graph outlier detection have been studied extensively on plain graphs, with various applications. Recently, algorithms have been extended to graphs with attributes as often observed in the real-world. However, all of these techniques fail to incorporate the user preference into graph mining, and thus, lack the ability to steer algorithms to more interesting parts of the attributed graph. In this work, we overcome this limitation and introduce a novel user-oriented approach for mining attributed graphs. The key aspect of our approach is to infer user preference by the so-called focus attributes through a set of user-provided exemplar nodes. In this new problem setting, clusters and outliers are then simultaneously mined according to this user preference. Specifically, our FocusCO algorithm identifies the focus, extracts focused clusters and detects outliers. Moreover, FocusCO scales well with graph size, since we perform a local clustering of interest to the user rather than global partitioning of the entire graph. We show the effectiveness and scalability of our method on synthetic and real-world graphs, as compared to both existing graph clustering and outlier detection approaches.

Skip Supplemental Material Section

Supplemental Material

p1346-sidebyside.mp4

mp4

238.9 MB

References

  1. L. Akoglu, M. McGlohon, and C. Faloutsos. OddBall: Spotting anomalies in weighted graphs. In PAKDD, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. L. Akoglu, H. Tong, B. Meeder, and C. Faloutsos. PICS: Parameter-free identification of cohesive subgroups in large attributed graphs. In SIAM SDM, pages 439--450, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  3. R. Andersen, F. Chung, and K. Lang. Local graph partitioning using pagerank vectors. In FOCS, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Banerjee, S. Basu, and S. Merugu. Multi-way clustering on relation graphs. In SIAM SDM, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  5. S. Basu, I. Davidson, and K. Wagstaff. Clustering with Constraints: Algorithms, Applications and Theory. 2008.Google ScholarGoogle Scholar
  6. S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, NY, USA, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. Chakrabarti, S. Papadimitriou, D. S. Modha, and C. Faloutsos. Fully automatic cross-associations. In KDD, pages 79--88, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Clauset. Finding local community structure in networks. Physical Review E, 72(2):026132, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  9. A. Condon and R. M. Karp. Algorithms for graph partitioning on the planted partition model. Random Struct. Algorithms, 18(2):116--140, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. I. Dhillon, S. Mallela, and D. Modha. Information-theoretic co-clustering. In KDD, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C. H. Q. Ding, X. He, H. Zha, M. Gu, and H. D. Simon. A min-max cut algorithm for graph partitioning and data clustering. In ICDM, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. G. W. Flake, S. Lawrence, and C. L. Giles. Efficient identification of web communities. In KDD. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Gao, H. Cheng, and P.-N. Tan. Semi-supervised outlier detection. In ACM Symp. on Appl. Comp., 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Gao, F. Liang, W. Fan, C. Wang, Y. Sun, and J. Han. On community outliers and their efficient detection in information networks. In KDD, pages 813--822, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. D. F. Gleich and C. Seshadhri. Vertex neighborhoods, low conductance cuts, and good seeds for local community methods. In KDD, pages 597--605, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. Günnemann, I. Färber, B. Boden, and T. Seidl. Subspace clustering meets dense subgraph mining: A synthesis of two paradigms. In ICDM, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. D. Hanisch, A. Zien, R. Zimmer, and T. Lengauer. Co-clustering of biological networks and gene expression data. In ISMB, pages 145--154, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  18. P. Iglesias, E. Müller, F. Laforet, F. Keller, and K. Böhm. Statistical selection of congruent subspaces for outlier detection on attributed graphs. In ICDM, 2013.Google ScholarGoogle Scholar
  19. G. Karypis and V. Kumar. Multilevel algorithms for multi-constraint graph partitioning. In Proc. of Supercomputing, pages 1--13, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. B. Long, Z. Zhang, X. Wu, and P. S. Yu. Spectral clustering for multi-type relational data. In ICML, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. C. D. Manning, P. Raghavan, and H. Schtze. Intro to Information Retrieval. Cambridge University Press, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. F. Moser, R. Colak, A. Rafiey, and M. Ester. Mining cohesive patterns from graphs with feature vectors. In SDM, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  23. E. Muller, P. I. Sanchez, Y. Mülle, and K. Böhm. Ranking outlier nodes in subspaces of attributed graphs. In ICDE Workshops, pages 216--222, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  24. A. Y. Ng, M. I. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In NIPS, 2001.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. C. C. Noble and D. J. Cook. Graph-based anomaly detection. In KDD, pages 631--636, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Tang and H. Liu. Unsupervised feature selection for linked social media data. In KDD, pages 904--912, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Y. Tian, R. A. Hankins, and J. M. Patel. Efficient aggregation for graph summarization. In SIGMOD, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. H. Tong and C.-Y. Lin. Non-negative residual matrix factorization with application to graph anomaly detection. In SIAM SDM, pages 143--153, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  29. F. Wang and J. Sun. Distance metric learning in data mining. In SIAM SDM. Tutorials, 2012.Google ScholarGoogle Scholar
  30. J. Whang, D. Gleich, and I. Dhillon. Overlapping community detection using seed set expansion. In CIKM, pages 2099--2108, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. S. White and P. Smyth. A spectral clustering approach to finding communities in graph. In SDM, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  32. E. P. Xing, A. Y. Ng, M. I. Jordan, and S. J. Russell. Distance metric learning with application to clustering with side-information. In NIPS, pages 505--512, 2002.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. J. Yang and J. Leskovec. Overlapping community detection at scale: a nonnegative matrix factorization approach. In WSDM, pages 587--596, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Y. Zhou, H. Cheng, and J. X. Yu. Graph clustering based on structural/attribute similarities. PVLDB, 2(1):718--729, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Focused clustering and outlier detection in large attributed graphs

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining
        August 2014
        2028 pages
        ISBN:9781450329569
        DOI:10.1145/2623330

        Copyright © 2014 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 24 August 2014

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        KDD '14 Paper Acceptance Rate151of1,036submissions,15%Overall Acceptance Rate1,133of8,635submissions,13%

        Upcoming Conference

        KDD '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader