skip to main content
research-article

Description-Driven Community Detection

Published:30 April 2014Publication History
Skip Abstract Section

Abstract

Traditional approaches to community detection, as studied by physicists, sociologists, and more recently computer scientists, aim at simply partitioning the social network graph. However, with the advent of online social networking sites, richer data has become available: beyond the link information, each user in the network is annotated with additional information, for example, demographics, shopping behavior, or interests. In this context, it is therefore important to develop mining methods which can take advantage of all available information. In the case of community detection, this means finding good communities (a set of nodes cohesive in the social graph) which are associated with good descriptions in terms of user information (node attributes).

Having good descriptions associated to our models make them understandable by domain experts and thus more useful in real-world applications. Another requirement dictated by real-world applications, is to develop methods that can use, when available, any domain-specific background knowledge. In the case of community detection the background knowledge could be a vague description of the communities sought in a specific application, or some prototypical nodes (e.g., good customers in the past), that represent what the analyst is looking for (a community of similar users).

Towards this goal, in this article, we define and study the problem of finding a diverse set of cohesive communities with concise descriptions. We propose an effective algorithm that alternates between two phases: a hill-climbing phase producing (possibly overlapping) communities, and a description induction phase which uses techniques from supervised pattern set mining. Our framework has the nice feature of being able to build well-described cohesive communities starting from any given description or seed set of nodes, which makes it very flexible and easily applicable in real-world applications.

Our experimental evaluation confirms that the proposed method discovers cohesive communities with concise descriptions in realistic and large online social networks such as Delicious, Flickr, and LastFM.

References

  1. Rakesh Agrawal, T. Imielinksi, and A. Swami. 1993. Mining association rules between sets of items in large databases. In Proceedings of SIGMOD'93. ACM, 207--216. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Yong-Yeol Ahn, James P. Bagrow, and Sune Lehmann. 2010. Link communities reveal multiscale complexity in networks. Nature 466, 7307, 761--764.Google ScholarGoogle Scholar
  3. Edoardo M. Airoldi, David M. Blei, Stephen E. Fienberg, and Eric P. Xing. 2008. Mixed membership stochastic blockmodels. J. Machine Learn. Res. 9, 1981--2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Martin Atzmueller and Folke Mitzlaff. 2011. Efficient descriptive community mining. In Proceedings of the 24th International FLAIRS Conference. AAAI Press.Google ScholarGoogle Scholar
  5. Brian Ball, Brian Karrer, and M. E. J. Newman. 2011. An efficient and principled method for detecting communities in networks. Phys. Rev. E 84.Google ScholarGoogle ScholarCross RefCross Ref
  6. Nikhil Bansal, Avrim Blum, and Shuchi Chawla. 2004. Correlation clustering. Machine Learn. 56, 1--3, 89--113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Nicola Barbieri, Francesco Bonchi, and Giuseppe Manco. 2013. Cascade-based community detection. In Proceedings of the 6th ACM International Conference on Web Search and Data Mining (WSDM'13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Jeffrey Baumes, Mark K. Goldberg, Mukkai S. Krishnamoorthy, Malik Magdon-Ismail, and Nathan Preston. 2005. Finding communities by clustering a graph into overlapping subgraphs. In Proceedings of the IADIS International Conference on Applied Computing (IADIS AC'05). 97--104.Google ScholarGoogle Scholar
  9. Francesco Bonchi, Aristides Gionis, Francesco Gullo, and Antti Ukkonen. 2012. Chromatic correlation clustering. In Proceedings of the KDD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. 1984. Classification and Regression Trees. Wadsworth.Google ScholarGoogle Scholar
  11. Fabricio A. Breve, Liang Zhao, Marcos G. Quiles, Witold Pedrycz, and Jiming Liu. 2011. Particle competition and cooperation for uncovering network overlap community structure. In Proceedings of the 8th International Symposium on Advances in Neural Networks (ISNN). Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. B. Bringmann, S. Nijssen, N. Tatti, J. Vreeken, and A. Zimmerman. 2010. Mining sets of patterns. In Proceedings of the Tutorial at ECML PKDD'10. http://www.cs.kuleuven.be/conference/msop/.Google ScholarGoogle Scholar
  13. Wei Chen, Zhenming Liu, Xiaorui Sun, and Yajun Wang. 2010. A game-theoretic framework to identify overlapping communities in social networks. Data Min. Knowl. Discov. 21, 2, 224--240. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. G. Dong and J. Li. 1999. Efficient mining of emerging patterns: Discovering trends and differences. In Proceedings of the SIGKDD'99. 43--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. T. S. Evans and R. Lambiotte. 2009. Line graphs, link partitions, and overlapping communities. Phys. Rev. E 80.Google ScholarGoogle ScholarCross RefCross Ref
  16. Santo Fortunato. 2010. Community detection in graphs. Phys. Rep. 486.Google ScholarGoogle Scholar
  17. M. Girvan and M. Newman. 2002. Community structure in social and biological networks. Proc. Nat. Acad. Sci. 99, 12, 7821--7826.Google ScholarGoogle ScholarCross RefCross Ref
  18. Steve Gregory. 2010. Finding overlapping communities in networks by label propagation. New J. Physics 12, 10.Google ScholarGoogle ScholarCross RefCross Ref
  19. Arijit Khan, Xifeng Yan, and Kun-Lung Wu. 2010. Towards proximity pattern mining in large graphs. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'10). ACM, New York, NY, 867--878. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Youngdo Kim and Hawoong Jeong. 2011. The map equation for link communities. Phys. Rev. E 84.Google ScholarGoogle Scholar
  21. Jussi M. Kumpula, Mikko Kivelä, Kimmo Kaski, and Jari Saramäki. 2008. Sequential algorithm for fast clique percolation. Phys. Rev. E 78.Google ScholarGoogle ScholarCross RefCross Ref
  22. Andrea Lancichinetti, Santo Fortunato, and Janos Kertesz. 2009. Detecting the overlapping and hierarchical community structure of complex networks. New J. Physics 11.Google ScholarGoogle Scholar
  23. Andrea Lancichinetti, Filippo Radicchi, J. J. Ramasco, and Santo Fortunato. 2011. Finding statistically significant communities in networks. PLoS ONE 6, 4, e18961.Google ScholarGoogle ScholarCross RefCross Ref
  24. Silvio Lattanzi and D. Sivakumar. 2009. Affiliation networks. In Proceedings of STOC. 427--434. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. D. Leman, A. Feelders, and A. Knobbe. 2008. Exceptional Model Mining. In Proceedings of the ECML/PKDD'08. Vol. 2. 1--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. McPherson, L. Smith-Lovin, and J. M. Cook. 2001. Birds of a feather: Homophily in social networks. Ann. Rev. Sociol. 27.Google ScholarGoogle ScholarCross RefCross Ref
  27. Flavia Moser, Recep Colak, Arash Rafiey, and Martin Ester. 2009. Mining cohesive patterns from graphs with feature vectors. In Proceedings of the SIAM International Conference on Data Mining (SDM). SIAM, 593--604.Google ScholarGoogle ScholarCross RefCross Ref
  28. M. Newman. 2004. Fast algorithm for detecting community structure in networks. Phys. Rev. E 69.Google ScholarGoogle Scholar
  29. M. Newman and M. Girvan. 2004. Finding and evaluating community structure in networks. Phys. Rev. E 69.Google ScholarGoogle Scholar
  30. Arnau Padrol-Sureda, Guillem Perarnau-Llobet, Julian Pfeifle, and Victor Muntés-Mulero. 2010. Overlapping community search for social networks. In Proceedings of the International Conference on Data Engineering (ICDE). 992--995.Google ScholarGoogle ScholarCross RefCross Ref
  31. G. Palla, I. Derenyi, I. Farkas, and T. Vicsek. 2005. Uncovering the overlapping community structure of complex networks in nature and society. Nature.Google ScholarGoogle Scholar
  32. G. Palla, I. J. Farkas, P. Pollner, I. Derenyi, and T. Vicsek. 2007. Directed network modules. New J. Phys. 9, 6.Google ScholarGoogle ScholarCross RefCross Ref
  33. J. R. Quinlan. 1993. C4.5: Programs for Machine Learning. Morgan-Kaufmann. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Usha Nandini Raghavan, Réka Albert, and Soundar Kumara. 2007. Near linear time algorithm to detect community structures in large-scale networks. Phys. Rev. E 76.Google ScholarGoogle ScholarCross RefCross Ref
  35. Ron Shamir, Roded Sharan, and Dekel Tsur. 2004. Cluster graph modification problems. Discrete Appl. Math. 144, 1--2, 173--182. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Arlei Silva, Wagner Meira, Jr., and Mohammed J. Zaki. 2010. Structural correlation pattern mining for large graphs. In Proceedings of the 8th Workshop on Mining and Learning with Graphs (MLG'10). ACM, New York, NY, 119--126. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Lei Tang, Xufei Wang, and Huan Liu. 2009. Uncoverning groups via heterogeneous interaction analysis. In Proceedings of the 9th IEEE International Conference on Data Mining (ICDM'09). 503--512. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. M. van Leeuwen. 2010. Maximal exceptions with minimal descriptions. Data Min. Knowl. Discov. 21, 2, 259--276. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Matthijs van Leeuwen, Francesco Bonchi, Börkur Sigurbjörnsson, and Arno Siebes. 2009. Compressing tags to find interesting media groups. In Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM'09). 1147--1156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Xufei Wang, Lei Tang, Huiji Gao, and Huan Liu. 2010. Discovering overlapping groups in social media. In Proceedings of the 10th IEEE International Conference on Data Mining (ICDM). Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. J. Xie, S. Kelley, and B. Szymanski. 2013. Overlapping community detection in networks: The state of the art and comparative study. ACM Comput. Surv. 45, 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Jierui Xie and Boleslaw K. Szymanski. 2012. Towards linear time overlapping community detection in social networks. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD). Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Jierui Xie, Boleslaw K. Szymanski, and Xiaoming Liu. 2011. SLPA: Uncovering overlapping communities in social networks via a speaker-listener interaction dynamic process. In Proceedings of the ICDM Workshops. 344--349. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Yang Zhou, Hong Cheng, and Jeffrey Xu Yu. 2009. Graph clustering based on structural/attribute similarities. Proc. Very Large Datab. 2, 1, 718--729. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Albrecht Zimmermann, Björn Bringmann, and Ulrich Rückert. 2010. Fast, effective molecular feature mining by local optimization. In Proceedings of the ECML/PKDD'10. 563--578. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Description-Driven Community Detection

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Intelligent Systems and Technology
        ACM Transactions on Intelligent Systems and Technology  Volume 5, Issue 2
        Special Issue on Linking Social Granularity and Functions
        April 2014
        347 pages
        ISSN:2157-6904
        EISSN:2157-6912
        DOI:10.1145/2611448
        Issue’s Table of Contents

        Copyright © 2014 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 30 April 2014
        • Accepted: 1 August 2013
        • Revised: 1 May 2013
        • Received: 1 October 2012
        Published in tist Volume 5, Issue 2

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader