skip to main content
research-article

Discovering social circles in ego networks

Authors Info & Claims
Published:01 February 2014Publication History
Skip Abstract Section

Abstract

People's personal social networks are big and cluttered, and currently there is no good way to automatically organize them. Social networking sites allow users to manually categorize their friends into social circles (e.g., “circles” on Google+, and “lists” on Facebook and Twitter). However, circles are laborious to construct and must be manually updated whenever a user's network grows. In this article, we study the novel task of automatically identifying users' social circles. We pose this task as a multimembership node clustering problem on a user's ego network, a network of connections between her friends. We develop a model for detecting circles that combines network structure as well as user profile information. For each circle, we learn its members and the circle-specific user profile similarity metric. Modeling node membership to multiple circles allows us to detect overlapping as well as hierarchically nested circles. Experiments show that our model accurately identifies circles on a diverse set of data from Facebook, Google+, and Twitter, for all of which we obtain hand-labeled ground truth.

References

  1. D. Agarwal, B.-C. Chen, P. Elango, N. Motgi, S.-T. Park, R. Ramakrishnan, S. Roy, and J. Zachariah. 2008. Online models for content optimization. In Neural Information Processing Systems.Google ScholarGoogle Scholar
  2. Y.-Y. Ahn, J. Bagrow, and S. Lehmann. 2010. Link communities reveal multiscale complexity in networks. Nature.Google ScholarGoogle Scholar
  3. E. Airoldi, D. Blei, S. Fienberg, and E. Xing. 2008. Mixed membership stochastic blockmodels. Journal of Machine Learning Research. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Andersen and K. Lang. 2006. Communities from seed sets. In WWW. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. Balasubramanyan and W. Cohen. 2011. Block-LDA: Jointly modeling entity-annotated text and entity-entity links. In SIAM International Conference on Data Mining.Google ScholarGoogle Scholar
  6. E. Boros and P. Hammer. 2002. Pseudo-boolean optimization. Discrete Applied Mathematics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Chang and D. Blei. 2009. Relational topic models for document networks. In International Conference on Artificial Intelligence and Statistics.Google ScholarGoogle Scholar
  8. J. Chang, J. Boyd-Graber, and D. Blei. 2009. Connections between the lines: Augmenting social networks with text. In Knowledge Discovery and Data Mining. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. H. Chen and R. Karger. 2006. Less is more: Probabilistic models for retrieving fewer relevant documents. In Special Interest Group on Information Retrieval. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Y. Chen and C. Lin. 2006. Combining SVMs with Various Feature Selection Strategies. Springer.Google ScholarGoogle Scholar
  11. K. El-Arini, G. Veda, D. Shahaf, and C. Guestrin. 2009. Turning down the noise in the blogosphere. In Knowledge Discovery and Data Mining. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Scott L. Feld. 1981. The focused organization of social ties. American Journal of Sociology.Google ScholarGoogle Scholar
  13. Mario Frank, Andreas P. Streich, David Basin, and Joachim M. Buhmann. 2012. Multi-assignment clustering for Boolean data. Journal of Machine Learning Research. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Steve Gregory. 2010a. Finding overlapping communities in networks by label propagation. New Journal of Physics.Google ScholarGoogle Scholar
  15. Steve Gregory. 2010b. Fuzzy overlapping communities in networks. CoRR abs/1010.1523.Google ScholarGoogle Scholar
  16. P. Hammer, P. Hansen, and B. Simeone. 1984. Roof duality, complementation and persistency in quadratic 0-1 optimization. Mathematical Programming.Google ScholarGoogle Scholar
  17. M. Handcock, A. Raftery, and J. Tantrum. 2007a. Model-based clustering for social networks. Journal of the Royal Statistical Society Series A.Google ScholarGoogle Scholar
  18. Mark S. Handcock, Adrian E. Raftery, and Jeremy M. Tantrum. 2007b. Model-based clustering for social networks. Journal of the Royal Statistical Society.Google ScholarGoogle Scholar
  19. M. B. Hastings. 2006. Community detection as an inference problem. Physical Review E.Google ScholarGoogle Scholar
  20. David Haussler. 1999. Convolution Kernels on Discrete Structures. Technical Report. University of California at Santa Cruz.Google ScholarGoogle Scholar
  21. Peter D. Hoff, Adrian E. Raftery, and Mark S. Handcock. 2002. Latent space approaches to social network analysis. Journal of the American Statistical Association.Google ScholarGoogle Scholar
  22. S. Johnson. 1967. Hierarchical clustering schemes. Psychometrika.Google ScholarGoogle Scholar
  23. D. Kim, Y. Jo, L.-C. Moon, and A. Oh. 2010. Analysis of Twitter lists as a potential source for discovering latent characteristics of users. In CHI.Google ScholarGoogle Scholar
  24. P. Kohli and P. Torr. 2005. Efficiently solving dynamic Markov random fields using graph cuts. In International Conference on Computer Vision. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Vladimir Kolmogorov and Carsten Rother. 2007. Minimizing nonsubmodular functions with graph cuts: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. P. Krivitsky, M. Handcock, A. Raftery, and P. Hoff. 2009. Representing degree distributions, clustering, and homophily in social networks with latent cluster random effects models. Social Networks.Google ScholarGoogle Scholar
  27. Andrea Lancichinetti and Santo Fortunato. 2009a. Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities. Physical Review E.Google ScholarGoogle Scholar
  28. A. Lancichinetti and S. Fortunato. 2009b. Community detection algorithms: A comparative analysis. arXiv:0908.1062.Google ScholarGoogle Scholar
  29. Andrea Lancichinetti, Santo Fortunato, and Janos Kertesz. 2009. Detecting the overlapping and hierarchical community structure in complex networks. New Journal of Physics.Google ScholarGoogle Scholar
  30. P. Lazarsfeld and R. Merton. 1954. Friendship as a social process: A substantive and methodological analysis. In Freedom and Control in Modern Society.Google ScholarGoogle Scholar
  31. J. Leskovec, K. Lang, and M. Mahoney. 2010. Empirical comparison of algorithms for network community detection. In WWW. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Y. Liu, A. Niculescu-Mizil, and W. Gryc. 2009. Topic-link LDA: Joint models of topic and author community. In International Conference on Machine Learning. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. D. MacKay. 2003. Information Theory, Inference and Learning Algorithms. Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. J. McAuley and J. Leskovec. 2012. Learning to discover social circles in ego networks. In Neural Information Processing Systems.Google ScholarGoogle Scholar
  35. M. McPherson. 1983. An ecology of affiliation. American Sociological Review.Google ScholarGoogle Scholar
  36. Miller McPherson, Lynn Smith-Lovin, and James M. Cook. 2001. Birds of a feather: Homophily in social networks. Annual Review of Sociology.Google ScholarGoogle Scholar
  37. Aditya Menon and Charles Elkan. 2010. A log-linear model with latent features for dyadic prediction. In International Conference on Data Mining. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. A. Menon and C. Elkan. 2011. Link prediction via matrix factorization. In European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. A. Mislove, B. Viswanath, K. Gummadi, and P. Druschel. 2010. You are who you know: Inferring user profiles in online social networks. In International Conference on Web Search and Data Mining. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. P. Nasirifard and C. Hayes. 2011. Tadvise: A Twitter assistant based on Twitter lists. In SocInfo. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. M. Newman. 2006. Modularity and community structure in networks. In Proceedings of the National Academy of Sciences.Google ScholarGoogle ScholarCross RefCross Ref
  42. M. E. J. Newman. 2003. Fast algorithm for detecting community structure in networks. Physical Review E.Google ScholarGoogle Scholar
  43. M. E. J. Newman. 2004. Detecting community structure in networks. European Physical Journal B.Google ScholarGoogle Scholar
  44. M. E. J. Newman and G. T. Barkema. 1999. Monte Carlo Methods in Statistical Physics. Oxford University Press.Google ScholarGoogle Scholar
  45. Jorge Nocedal. 1980. Updating quasi-Newton matrices with limited storage. Mathematics of Computation.Google ScholarGoogle Scholar
  46. G. Palla, I. Derenyi, I. Farkas, and T. Vicsek. 2005. Uncovering the overlapping community structure of complex networks in nature and society. Nature.Google ScholarGoogle Scholar
  47. M. A. Porter, J.-P. Onnela, and P. J. Mucha. 2009. Communities in networks.Google ScholarGoogle Scholar
  48. E. Ravasz and A.-L. Barabási. 2003. Hierarchical organization in complex networks. Physical Review E.Google ScholarGoogle Scholar
  49. C. Rother, V. Kolmogorov, V. Lempitsky, and M. Szummer. 2007. Optimizing binary MRFs via extended roof duality. In Computer Vision and Pattern Recognition.Google ScholarGoogle Scholar
  50. S. E. Schaeffer. 2007. Graph clustering. Computer Science Review. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Georg Simmel. 1964. Conflict and the Web of Group Affiliations. Simon and Schuster.Google ScholarGoogle Scholar
  52. J. Ugander, B. Karrer, L. Backstrom, and C. Marlow. 2011. The Anatomy of the Facebook Social Graph. Preprint.Google ScholarGoogle Scholar
  53. S. V. N. Vishwanathan and Alexander J. Smola. 2002. Fast kernels for string and tree matching. In Neural Information Processing Systems.Google ScholarGoogle Scholar
  54. C. Volinsky and A. Raftery. 2000. Bayesian information criterion for censored survival models. Biometrics.Google ScholarGoogle Scholar
  55. D. Vu, A. Asuncion, D. Hunter, and P. Smyth. 2011. Dynamic egocentric models for citation networks. In International Conference on Machine Learning.Google ScholarGoogle Scholar
  56. S. Wu, J. Hofman, W. Mason, and D. Watts. 2011. Who says what to whom on Twitter. In WWW. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. J. Yang and J. Leskovec. 2012. Community-affiliation graph model for overlapping community detection. In International Conference on Data Mining. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. T. Yoshida. 2010. Toward finding hidden communities based on user profiles. In ICDM Workshops. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. J. Zhao. 2011. Examining the evolution of networks based on lists in Twitter. In International Conference on Internet Multimedia System Architectures and Applications.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Discovering social circles in ego networks

    Recommendations

    Reviews

    Yingjie Li

    A social circle in a user's ego network is a group of interconnected people that have common attributes between themselves and the user. In order to automatically detect such circles, an unsupervised/semi-supervised learning approach is designed in this paper to optimize the circle and user profile similarity parameters by using the network structure and user profile information. Researchers in information retrieval, social media, ego networks, and machine learning areas will want to study this work. The proposed approach models circles to be latent graph variables that are inferred from the user's ego network. Within each graph and driven by edges that are “likely to form within circles and unlikely to form outside of them,” the model computes the edge forming probability in a circle using the profile similarity of the edge's two user nodes, “rewarding edges that appear within circles ... and penalizing edges that appear outside of circles” via a tradeoff constant. Then the proposed machine learning algorithms calculate the profile similarity parameter. If the learning algorithm only relies on the user node attributes and edge information, the algorithm is unsupervised. If the learning algorithm starts with user-labeled members of circles, the algorithm becomes semi-supervised. The proposed algorithms have been evaluated on a social network “dataset of 1,143 ego networks and 5,541 ground-truth circles obtained from Facebook, Google+, and Twitter.” Compared to baseline methods considering network structure, profile information, or both, the proposed model has achieved decent performance and scalability in finding disjoint, overlapping, and hierarchically nested circles. Online Computing Reviews Service

    Access critical reviews of Computing literature here

    Become a reviewer for Computing Reviews.

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Knowledge Discovery from Data
      ACM Transactions on Knowledge Discovery from Data  Volume 8, Issue 1
      Casin special issue
      February 2014
      157 pages
      ISSN:1556-4681
      EISSN:1556-472X
      DOI:10.1145/2582178
      Issue’s Table of Contents

      Copyright © 2014 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 February 2014
      • Accepted: 1 August 2013
      • Received: 1 October 2012
      Published in tkdd Volume 8, Issue 1

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader