ABSTRACT
Modern graph embedding procedures can efficiently process graphs with millions of nodes. In this paper, we propose GEMSEC - a graph embedding algorithm which learns a clustering of the nodes simultaneously with computing their embedding. GEMSEC is a general extension of earlier work in the domain of sequence-based graph embedding. GEMSEC places nodes in an abstract feature space where the vertex features minimize the negative log-likelihood of preserving sampled vertex neighborhoods, and it incorporates known social network properties through a machine learning regularization. We present two new social network datasets and show that by simultaneously considering the embedding and clustering problems with respect to social properties, GEMSEC extracts high-quality clusters competitive with or superior to other community detection algorithms. In experiments, the method is found to be computationally efficient and robust to the choice of hyperparameters.
- T. Van Laarhoven and E. Marchiori, "Robust community detection methods with resolution parameter for complex detection in protein protein interaction networks," Pattern Recognition in Bioinformatics, pp. 1--13, 2012.Google Scholar
- L. Backstrom, D. Huttenlocher, J. Kleinberg, and X. Lan, "Group formation in large social networks: Membership, growth, and evolution," in Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2006, pp. 44--54.Google Scholar
- S. Papadopoulos, Y. Kompatsiaris, A. Vakali, and P. Spyridonos, "Community detection in social media," Data Mining and Knowledge Discovery, vol. 24, no. 3, pp. 515--554, 2012.Google ScholarDigital Library
- J. Leskovec, A. Rajaraman, and J. D. Ullman, Mining of Massive Datasets. Cambridge University Press, 2014.Google ScholarDigital Library
- P. Pascal and M. Latapy, In International Symposium on Computer and Information Sciences. Springer Berlin Heidelberg, 2005, ch. Computing Communities in Large Networks Using Random Walks., pp. 284--293.Google Scholar
- S. Gregory, "Finding overlapping communities in networks by label propagation," New Journal of Physics, vol. 12, no. 10, p. 103018, 2010.Google ScholarCross Ref
- P. Goyal and E. Ferrara, "Graph embedding techniques, applications, and performance: A survey," arXiv preprint arXiv:1705.02801, 2017.Google Scholar
- B. Perozzi, R. Al-Rfou, and S. Skiena, "Deepwalk: Online learning of social representations." in Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining., 2014.Google Scholar
- A. Grover and J. Leskovec, "Node2vec: Scalable feature learning for networks," in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, pp. 855--864.Google Scholar
- W. W. Zachary, "An information flow model for conflict and fission in small groups," Journal of anthropological research, vol. 33, no. 4, pp. 452--473, 1977.Google ScholarCross Ref
- T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient estimation of word representations in vector space," 2013.Google Scholar
- J. B. Diederik P. Kingma, "Adam: A method for stochastic optimization," in Proceedings of the 3rd International Conference on Learning Representations (ICLR), 2015.Google Scholar
- S. Cavallari, V. W. Zheng, H. Cai, K. C.-C. Chang, and E. Cambria, "Learning community embedding with community detection and node embedding on graphs," in Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, 2017, pp. 377--386.Google Scholar
- X. Wang, P. Cui, J. Wang, J. Pei, W. Zhu, and S. Yang, "Community preserving network embedding." in AAAI, 2017, pp. 203--209.Google ScholarDigital Library
- F. Ye, C. Chen, and Z. Zheng, "Deep autoencoder-like nonnegative matrix factorization for community detection," in Proceedings of the 27th ACM International Conference on Information and Knowledge Management. ACM, 2018, pp. 1393--1402.Google Scholar
- J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei, "Line: Large-scale information network embedding," in Proceedings of the 24th International Conference on World Wide Web, 2015, pp. 1067--1077.Google Scholar
- B. Perozzi, V. Kulkarni, H. Chen, and S. Skiena, "Don't walk, skip!: Online learning of multi-scale network embeddings," in Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017, 2017, pp. 258--265.Google Scholar
- J. Fakcharoenphol, S. Rao, and K. Talwar, "A tight bound on approximating arbitrary metrics by tree metrics," Journal of Computer and System Sciences, vol. 69, no. 3, pp. 485--497, 2004.Google ScholarDigital Library
- J. Matoušek, Lectures on discrete geometry. Springer, 2002, vol. 108.Google ScholarCross Ref
- X. Yu, X. Ban, W. Zeng, R. Sarkar, X. Gu, and J. Gao, "Spherical representation and polyhedron routing for load balancing in wireless sensor networks," in 2011 Proceedings IEEE INFOCOM. IEEE, 2011, pp. 621--625.Google Scholar
- K. Huang, C.-C. Ni, R. Sarkar, J. Gao, and J. S. Mitchell, "Bounded stretch geographic homotopic routing in sensor networks," in IEEE INFOCOM 2014-IEEE Conference on Computer Communications. IEEE, 2014, pp. 979--987.Google Scholar
- R. Sarkar, "Low distortion delaunay embedding of trees in hyperbolic plane," in International Symposium on Graph Drawing. Springer, 2011, pp. 355--366.Google Scholar
- C. De Sa, A. Gu, C. Ré, and F. Sala, "Representation tradeoffs for hyperbolic embeddings," Proceedings of machine learning research, vol. 80, p. 4460, 2018.Google Scholar
- W. Zeng, R. Sarkar, F. Luo, X. Gu, and J. Gao, "Resilient routing for sensor networks using hyperbolic embedding of universal covering space," in 2010 Proceedings IEEE INFOCOM. IEEE, 2010, pp. 1--9.Google Scholar
- B. Rozemberczki and R. Sarkar, "Fast sequence-based embedding with diffusion graphs," in International Workshop on Complex Networks. Springer, 2018, pp. 99--107.Google Scholar
- J. Qiu, Y. Dong, H. Ma, J. Li, K. Wang, and J. Tang, "Network embedding as matrix factorization: Unifying deepwalk, line, pte, and node2vec," in Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. ACM, 2018, pp. 459--467.Google Scholar
- M. Gutmann and A. Hyvarinen, "Noise-contrastive estimation: A new estimation principle for unnormalized statistical models," in Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 2010, pp. 297--304.Google Scholar
- J.-P. Onnela, J. Saramäki, J. Hyvönen, G. Szabó, D. Lazer, K. Kaski, J. Kertész, and A.-L. Barabási, "Structure and tie strengths in mobile communication networks," Proceedings of the national academy of sciences, vol. 104, no. 18, pp. 7332--7336, 2007.Google ScholarCross Ref
- A. Ahmed, N. Shervashidze, S. Narayanamurthy, V. Josifovski, and A. J. Smola, "Distributed large-scale natural graph factorization," in Proceedings of the 22nd international conference on World Wide Web, 2013, pp. 37--48.Google Scholar
- GEMSEC: graph embedding with self clustering
Recommendations
Deep MinCut: Learning Node Embeddings by Detecting Communities
Highlights- Our node embeddings are both interpretable and competent for classification tasks.
AbstractWe present Deep MinCut (DMC), an unsupervised approach to learn node embeddings for graph-structured data. It derives node representations based on their membership in communities. As such, the embeddings directly provide insights into ...
Network Embedding for Community Detection in Attributed Networks
Community detection aims to partition network nodes into a set of clusters, such that nodes are more densely connected to each other within the same cluster than other clusters. For attributed networks, apart from the denseness requirement of topology ...
Iterative Graph Embedding and Clustering
Advances in Computational IntelligenceAbstractGraph embedding can be seen as a transformation of any graph into low-dimensional vector space, where each vertex of the graph has a one-to-one correspondence with a vector in that space. The latest study in this field shows a particular interest ...
Comments