skip to main content
10.1145/3341161.3342890acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

GEMSEC: graph embedding with self clustering

Published:15 January 2020Publication History

ABSTRACT

Modern graph embedding procedures can efficiently process graphs with millions of nodes. In this paper, we propose GEMSEC - a graph embedding algorithm which learns a clustering of the nodes simultaneously with computing their embedding. GEMSEC is a general extension of earlier work in the domain of sequence-based graph embedding. GEMSEC places nodes in an abstract feature space where the vertex features minimize the negative log-likelihood of preserving sampled vertex neighborhoods, and it incorporates known social network properties through a machine learning regularization. We present two new social network datasets and show that by simultaneously considering the embedding and clustering problems with respect to social properties, GEMSEC extracts high-quality clusters competitive with or superior to other community detection algorithms. In experiments, the method is found to be computationally efficient and robust to the choice of hyperparameters.

References

  1. T. Van Laarhoven and E. Marchiori, "Robust community detection methods with resolution parameter for complex detection in protein protein interaction networks," Pattern Recognition in Bioinformatics, pp. 1--13, 2012.Google ScholarGoogle Scholar
  2. L. Backstrom, D. Huttenlocher, J. Kleinberg, and X. Lan, "Group formation in large social networks: Membership, growth, and evolution," in Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2006, pp. 44--54.Google ScholarGoogle Scholar
  3. S. Papadopoulos, Y. Kompatsiaris, A. Vakali, and P. Spyridonos, "Community detection in social media," Data Mining and Knowledge Discovery, vol. 24, no. 3, pp. 515--554, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Leskovec, A. Rajaraman, and J. D. Ullman, Mining of Massive Datasets. Cambridge University Press, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. P. Pascal and M. Latapy, In International Symposium on Computer and Information Sciences. Springer Berlin Heidelberg, 2005, ch. Computing Communities in Large Networks Using Random Walks., pp. 284--293.Google ScholarGoogle Scholar
  6. S. Gregory, "Finding overlapping communities in networks by label propagation," New Journal of Physics, vol. 12, no. 10, p. 103018, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  7. P. Goyal and E. Ferrara, "Graph embedding techniques, applications, and performance: A survey," arXiv preprint arXiv:1705.02801, 2017.Google ScholarGoogle Scholar
  8. B. Perozzi, R. Al-Rfou, and S. Skiena, "Deepwalk: Online learning of social representations." in Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining., 2014.Google ScholarGoogle Scholar
  9. A. Grover and J. Leskovec, "Node2vec: Scalable feature learning for networks," in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, pp. 855--864.Google ScholarGoogle Scholar
  10. W. W. Zachary, "An information flow model for conflict and fission in small groups," Journal of anthropological research, vol. 33, no. 4, pp. 452--473, 1977.Google ScholarGoogle ScholarCross RefCross Ref
  11. T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient estimation of word representations in vector space," 2013.Google ScholarGoogle Scholar
  12. J. B. Diederik P. Kingma, "Adam: A method for stochastic optimization," in Proceedings of the 3rd International Conference on Learning Representations (ICLR), 2015.Google ScholarGoogle Scholar
  13. S. Cavallari, V. W. Zheng, H. Cai, K. C.-C. Chang, and E. Cambria, "Learning community embedding with community detection and node embedding on graphs," in Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, 2017, pp. 377--386.Google ScholarGoogle Scholar
  14. X. Wang, P. Cui, J. Wang, J. Pei, W. Zhu, and S. Yang, "Community preserving network embedding." in AAAI, 2017, pp. 203--209.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. F. Ye, C. Chen, and Z. Zheng, "Deep autoencoder-like nonnegative matrix factorization for community detection," in Proceedings of the 27th ACM International Conference on Information and Knowledge Management. ACM, 2018, pp. 1393--1402.Google ScholarGoogle Scholar
  16. J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei, "Line: Large-scale information network embedding," in Proceedings of the 24th International Conference on World Wide Web, 2015, pp. 1067--1077.Google ScholarGoogle Scholar
  17. B. Perozzi, V. Kulkarni, H. Chen, and S. Skiena, "Don't walk, skip!: Online learning of multi-scale network embeddings," in Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017, 2017, pp. 258--265.Google ScholarGoogle Scholar
  18. J. Fakcharoenphol, S. Rao, and K. Talwar, "A tight bound on approximating arbitrary metrics by tree metrics," Journal of Computer and System Sciences, vol. 69, no. 3, pp. 485--497, 2004.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Matoušek, Lectures on discrete geometry. Springer, 2002, vol. 108.Google ScholarGoogle ScholarCross RefCross Ref
  20. X. Yu, X. Ban, W. Zeng, R. Sarkar, X. Gu, and J. Gao, "Spherical representation and polyhedron routing for load balancing in wireless sensor networks," in 2011 Proceedings IEEE INFOCOM. IEEE, 2011, pp. 621--625.Google ScholarGoogle Scholar
  21. K. Huang, C.-C. Ni, R. Sarkar, J. Gao, and J. S. Mitchell, "Bounded stretch geographic homotopic routing in sensor networks," in IEEE INFOCOM 2014-IEEE Conference on Computer Communications. IEEE, 2014, pp. 979--987.Google ScholarGoogle Scholar
  22. R. Sarkar, "Low distortion delaunay embedding of trees in hyperbolic plane," in International Symposium on Graph Drawing. Springer, 2011, pp. 355--366.Google ScholarGoogle Scholar
  23. C. De Sa, A. Gu, C. Ré, and F. Sala, "Representation tradeoffs for hyperbolic embeddings," Proceedings of machine learning research, vol. 80, p. 4460, 2018.Google ScholarGoogle Scholar
  24. W. Zeng, R. Sarkar, F. Luo, X. Gu, and J. Gao, "Resilient routing for sensor networks using hyperbolic embedding of universal covering space," in 2010 Proceedings IEEE INFOCOM. IEEE, 2010, pp. 1--9.Google ScholarGoogle Scholar
  25. B. Rozemberczki and R. Sarkar, "Fast sequence-based embedding with diffusion graphs," in International Workshop on Complex Networks. Springer, 2018, pp. 99--107.Google ScholarGoogle Scholar
  26. J. Qiu, Y. Dong, H. Ma, J. Li, K. Wang, and J. Tang, "Network embedding as matrix factorization: Unifying deepwalk, line, pte, and node2vec," in Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. ACM, 2018, pp. 459--467.Google ScholarGoogle Scholar
  27. M. Gutmann and A. Hyvarinen, "Noise-contrastive estimation: A new estimation principle for unnormalized statistical models," in Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 2010, pp. 297--304.Google ScholarGoogle Scholar
  28. J.-P. Onnela, J. Saramäki, J. Hyvönen, G. Szabó, D. Lazer, K. Kaski, J. Kertész, and A.-L. Barabási, "Structure and tie strengths in mobile communication networks," Proceedings of the national academy of sciences, vol. 104, no. 18, pp. 7332--7336, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  29. A. Ahmed, N. Shervashidze, S. Narayanamurthy, V. Josifovski, and A. J. Smola, "Distributed large-scale natural graph factorization," in Proceedings of the 22nd international conference on World Wide Web, 2013, pp. 37--48.Google ScholarGoogle Scholar
  1. GEMSEC: graph embedding with self clustering

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ASONAM '19: Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
        August 2019
        1228 pages
        ISBN:9781450368681
        DOI:10.1145/3341161

        Copyright © 2019 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 15 January 2020

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        ASONAM '19 Paper Acceptance Rate41of286submissions,14%Overall Acceptance Rate116of549submissions,21%

        Upcoming Conference

        KDD '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader