Abstract
Traditional approaches to community detection, as studied by physicists, sociologists, and more recently computer scientists, aim at simply partitioning the social network graph. However, with the advent of online social networking sites, richer data has become available: beyond the link information, each user in the network is annotated with additional information, for example, demographics, shopping behavior, or interests. In this context, it is therefore important to develop mining methods which can take advantage of all available information. In the case of community detection, this means finding good communities (a set of nodes cohesive in the social graph) which are associated with good descriptions in terms of user information (node attributes).
Having good descriptions associated to our models make them understandable by domain experts and thus more useful in real-world applications. Another requirement dictated by real-world applications, is to develop methods that can use, when available, any domain-specific background knowledge. In the case of community detection the background knowledge could be a vague description of the communities sought in a specific application, or some prototypical nodes (e.g., good customers in the past), that represent what the analyst is looking for (a community of similar users).
Towards this goal, in this article, we define and study the problem of finding a diverse set of cohesive communities with concise descriptions. We propose an effective algorithm that alternates between two phases: a hill-climbing phase producing (possibly overlapping) communities, and a description induction phase which uses techniques from supervised pattern set mining. Our framework has the nice feature of being able to build well-described cohesive communities starting from any given description or seed set of nodes, which makes it very flexible and easily applicable in real-world applications.
Our experimental evaluation confirms that the proposed method discovers cohesive communities with concise descriptions in realistic and large online social networks such as Delicious, Flickr, and LastFM.
- Rakesh Agrawal, T. Imielinksi, and A. Swami. 1993. Mining association rules between sets of items in large databases. In Proceedings of SIGMOD'93. ACM, 207--216. Google ScholarDigital Library
- Yong-Yeol Ahn, James P. Bagrow, and Sune Lehmann. 2010. Link communities reveal multiscale complexity in networks. Nature 466, 7307, 761--764.Google Scholar
- Edoardo M. Airoldi, David M. Blei, Stephen E. Fienberg, and Eric P. Xing. 2008. Mixed membership stochastic blockmodels. J. Machine Learn. Res. 9, 1981--2014. Google ScholarDigital Library
- Martin Atzmueller and Folke Mitzlaff. 2011. Efficient descriptive community mining. In Proceedings of the 24th International FLAIRS Conference. AAAI Press.Google Scholar
- Brian Ball, Brian Karrer, and M. E. J. Newman. 2011. An efficient and principled method for detecting communities in networks. Phys. Rev. E 84.Google ScholarCross Ref
- Nikhil Bansal, Avrim Blum, and Shuchi Chawla. 2004. Correlation clustering. Machine Learn. 56, 1--3, 89--113. Google ScholarDigital Library
- Nicola Barbieri, Francesco Bonchi, and Giuseppe Manco. 2013. Cascade-based community detection. In Proceedings of the 6th ACM International Conference on Web Search and Data Mining (WSDM'13). Google ScholarDigital Library
- Jeffrey Baumes, Mark K. Goldberg, Mukkai S. Krishnamoorthy, Malik Magdon-Ismail, and Nathan Preston. 2005. Finding communities by clustering a graph into overlapping subgraphs. In Proceedings of the IADIS International Conference on Applied Computing (IADIS AC'05). 97--104.Google Scholar
- Francesco Bonchi, Aristides Gionis, Francesco Gullo, and Antti Ukkonen. 2012. Chromatic correlation clustering. In Proceedings of the KDD. Google ScholarDigital Library
- L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. 1984. Classification and Regression Trees. Wadsworth.Google Scholar
- Fabricio A. Breve, Liang Zhao, Marcos G. Quiles, Witold Pedrycz, and Jiming Liu. 2011. Particle competition and cooperation for uncovering network overlap community structure. In Proceedings of the 8th International Symposium on Advances in Neural Networks (ISNN). Google ScholarDigital Library
- B. Bringmann, S. Nijssen, N. Tatti, J. Vreeken, and A. Zimmerman. 2010. Mining sets of patterns. In Proceedings of the Tutorial at ECML PKDD'10. http://www.cs.kuleuven.be/conference/msop/.Google Scholar
- Wei Chen, Zhenming Liu, Xiaorui Sun, and Yajun Wang. 2010. A game-theoretic framework to identify overlapping communities in social networks. Data Min. Knowl. Discov. 21, 2, 224--240. Google ScholarDigital Library
- G. Dong and J. Li. 1999. Efficient mining of emerging patterns: Discovering trends and differences. In Proceedings of the SIGKDD'99. 43--52. Google ScholarDigital Library
- T. S. Evans and R. Lambiotte. 2009. Line graphs, link partitions, and overlapping communities. Phys. Rev. E 80.Google ScholarCross Ref
- Santo Fortunato. 2010. Community detection in graphs. Phys. Rep. 486.Google Scholar
- M. Girvan and M. Newman. 2002. Community structure in social and biological networks. Proc. Nat. Acad. Sci. 99, 12, 7821--7826.Google ScholarCross Ref
- Steve Gregory. 2010. Finding overlapping communities in networks by label propagation. New J. Physics 12, 10.Google ScholarCross Ref
- Arijit Khan, Xifeng Yan, and Kun-Lung Wu. 2010. Towards proximity pattern mining in large graphs. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'10). ACM, New York, NY, 867--878. Google ScholarDigital Library
- Youngdo Kim and Hawoong Jeong. 2011. The map equation for link communities. Phys. Rev. E 84.Google Scholar
- Jussi M. Kumpula, Mikko Kivelä, Kimmo Kaski, and Jari Saramäki. 2008. Sequential algorithm for fast clique percolation. Phys. Rev. E 78.Google ScholarCross Ref
- Andrea Lancichinetti, Santo Fortunato, and Janos Kertesz. 2009. Detecting the overlapping and hierarchical community structure of complex networks. New J. Physics 11.Google Scholar
- Andrea Lancichinetti, Filippo Radicchi, J. J. Ramasco, and Santo Fortunato. 2011. Finding statistically significant communities in networks. PLoS ONE 6, 4, e18961.Google ScholarCross Ref
- Silvio Lattanzi and D. Sivakumar. 2009. Affiliation networks. In Proceedings of STOC. 427--434. Google ScholarDigital Library
- D. Leman, A. Feelders, and A. Knobbe. 2008. Exceptional Model Mining. In Proceedings of the ECML/PKDD'08. Vol. 2. 1--16. Google ScholarDigital Library
- M. McPherson, L. Smith-Lovin, and J. M. Cook. 2001. Birds of a feather: Homophily in social networks. Ann. Rev. Sociol. 27.Google ScholarCross Ref
- Flavia Moser, Recep Colak, Arash Rafiey, and Martin Ester. 2009. Mining cohesive patterns from graphs with feature vectors. In Proceedings of the SIAM International Conference on Data Mining (SDM). SIAM, 593--604.Google ScholarCross Ref
- M. Newman. 2004. Fast algorithm for detecting community structure in networks. Phys. Rev. E 69.Google Scholar
- M. Newman and M. Girvan. 2004. Finding and evaluating community structure in networks. Phys. Rev. E 69.Google Scholar
- Arnau Padrol-Sureda, Guillem Perarnau-Llobet, Julian Pfeifle, and Victor Muntés-Mulero. 2010. Overlapping community search for social networks. In Proceedings of the International Conference on Data Engineering (ICDE). 992--995.Google ScholarCross Ref
- G. Palla, I. Derenyi, I. Farkas, and T. Vicsek. 2005. Uncovering the overlapping community structure of complex networks in nature and society. Nature.Google Scholar
- G. Palla, I. J. Farkas, P. Pollner, I. Derenyi, and T. Vicsek. 2007. Directed network modules. New J. Phys. 9, 6.Google ScholarCross Ref
- J. R. Quinlan. 1993. C4.5: Programs for Machine Learning. Morgan-Kaufmann. Google ScholarDigital Library
- Usha Nandini Raghavan, Réka Albert, and Soundar Kumara. 2007. Near linear time algorithm to detect community structures in large-scale networks. Phys. Rev. E 76.Google ScholarCross Ref
- Ron Shamir, Roded Sharan, and Dekel Tsur. 2004. Cluster graph modification problems. Discrete Appl. Math. 144, 1--2, 173--182. Google ScholarDigital Library
- Arlei Silva, Wagner Meira, Jr., and Mohammed J. Zaki. 2010. Structural correlation pattern mining for large graphs. In Proceedings of the 8th Workshop on Mining and Learning with Graphs (MLG'10). ACM, New York, NY, 119--126. Google ScholarDigital Library
- Lei Tang, Xufei Wang, and Huan Liu. 2009. Uncoverning groups via heterogeneous interaction analysis. In Proceedings of the 9th IEEE International Conference on Data Mining (ICDM'09). 503--512. Google ScholarDigital Library
- M. van Leeuwen. 2010. Maximal exceptions with minimal descriptions. Data Min. Knowl. Discov. 21, 2, 259--276. Google ScholarDigital Library
- Matthijs van Leeuwen, Francesco Bonchi, Börkur Sigurbjörnsson, and Arno Siebes. 2009. Compressing tags to find interesting media groups. In Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM'09). 1147--1156. Google ScholarDigital Library
- Xufei Wang, Lei Tang, Huiji Gao, and Huan Liu. 2010. Discovering overlapping groups in social media. In Proceedings of the 10th IEEE International Conference on Data Mining (ICDM). Google ScholarDigital Library
- J. Xie, S. Kelley, and B. Szymanski. 2013. Overlapping community detection in networks: The state of the art and comparative study. ACM Comput. Surv. 45, 4. Google ScholarDigital Library
- Jierui Xie and Boleslaw K. Szymanski. 2012. Towards linear time overlapping community detection in social networks. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD). Google ScholarDigital Library
- Jierui Xie, Boleslaw K. Szymanski, and Xiaoming Liu. 2011. SLPA: Uncovering overlapping communities in social networks via a speaker-listener interaction dynamic process. In Proceedings of the ICDM Workshops. 344--349. Google ScholarDigital Library
- Yang Zhou, Hong Cheng, and Jeffrey Xu Yu. 2009. Graph clustering based on structural/attribute similarities. Proc. Very Large Datab. 2, 1, 718--729. Google ScholarDigital Library
- Albrecht Zimmermann, Björn Bringmann, and Ulrich Rückert. 2010. Fast, effective molecular feature mining by local optimization. In Proceedings of the ECML/PKDD'10. 563--578. Google ScholarDigital Library
Index Terms
- Description-Driven Community Detection
Recommendations
Circle based community detection
I-CARE '13: Proceedings of the 5th IBM Collaborative Academia Research Exchange WorkshopThe connection patterns among individuals or objects in complex (social) networks possess rich information that can be useful for conducting effecient network analysis. In particular we consider the task of community detection in social networks. ...
Community evaluation in Facebook groups
AbstractOne of the main points of the Next Generation Internet is to have a user-centric approach where daily behavior and social life of the users are studied and analyzed in order to model networks and services. Indeed, social life represents the ...
Community detection in social networks using user frequent pattern mining
Recently, social networking sites are offering a rich resource of heterogeneous data. The analysis of such data can lead to the discovery of unknown information and relations in these networks. The detection of communities including `similar' nodes is a ...
Comments