ABSTRACT
We study the problem of representation learning in heterogeneous networks. Its unique challenges come from the existence of multiple types of nodes and links, which limit the feasibility of the conventional network embedding techniques. We develop two scalable representation learning models, namely metapath2vec and metapath2vec++. The metapath2vec model formalizes meta-path-based random walks to construct the heterogeneous neighborhood of a node and then leverages a heterogeneous skip-gram model to perform node embeddings. The metapath2vec++ model further enables the simultaneous modeling of structural and semantic correlations in heterogeneous networks. Extensive experiments show that metapath2vec and metapath2vec++ are able to not only outperform state-of-the-art embedding models in various heterogeneous network mining tasks, such as node classification, clustering, and similarity search, but also discern the structural and semantic correlations between diverse network objects.
- Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, and others. 2016. TensorFlow: A system for large-scale machine learning OSDI '16.Google Scholar
- Amr Ahmed, Nino Shervashidze, Shravan Narayanamurthy, Vanja Josifovski, and Alexander J. Smola 2013. Distributed Large-scale Natural Graph Factorization WWW 13. ACM, 37--48.Google ScholarDigital Library
- Yoshua Bengio, Aaron Courville, and Pierre Vincent. 2013. Representation learning: A review and new perspectives. IEEE TPAMI, Vol. 35, 8 (2013), 1798--1828. Google ScholarDigital Library
- Shiyu Chang, Wei Han, Jiliang Tang, Guo-Jun Qi, Charu C. Aggarwal, and Thomas S. Huang 2015. Heterogeneous Network Embedding via Deep Architectures KDD '15. ACM, 119--128.Google Scholar
- Ting Chen and Yizhou Sun 2017. Task-Guided and Path-Augmented Heterogeneous Network Embedding for Author Identification WSDM '17. ACM.Google Scholar
- Yuxiao Dong, Jing Zhang, Jie Tang, Nitesh V. Chawla, and Bai Wang 2015. CoupledLP: Link Prediction in Coupled Networks. In KDD '15. ACM, 199--208. Google ScholarDigital Library
- Yoav Goldberg and Omer Levy 2014. word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method. CoRR Vol. abs/1402.3722 (2014).Google Scholar
- Aditya Grover and Jure Leskovec 2016. Node2Vec: Scalable Feature Learning for Networks. KDD '16. ACM, 855--864.Google ScholarDigital Library
- Keith Henderson, Brian Gallagher, Tina Eliassi-Rad, Hanghang Tong, Sugato Basu, Leman Akoglu, Danai Koutra, Christos Faloutsos, and Lei Li 2012. Rolx: structural role extraction & mining in large graphs KDD '12. ACM, 1231--1239.Google Scholar
- Peter D Hoff, Adrian E Raftery, and Mark S Handcock. 2002. Latent space approaches to social network analysis. Journal of the American Statistical association, Vol. 97, 460 (2002), 1090--1098.Google ScholarCross Ref
- Xiao Huang, Jundong Li, and Xia Hu 2017. Label Informed Attributed Network Embedding. In WSDM '17. na. Google ScholarDigital Library
- Zhipeng Huang, Yudian Zheng, Reynold Cheng, Yizhou Sun, Nikos Mamoulis, and Xiang Li. 2016. Meta structure: Computing relevance in large heterogeneous information networks KDD '16. ACM, 1595--1604.Google Scholar
- Ming Ji, Jiawei Han, and Marina Danilevsky 2011. Ranking-based classification of heterogeneous information networks KDD '11. ACM, 1298--1306.Google Scholar
- Yehuda Koren. 2008. Factorization meets the neighborhood: a multifaceted collaborative filtering model KDD '08. ACM, 426--434.Google Scholar
- Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature, Vol. 521, 7553 (2015), 436--444. Google Scholar
- Hao Ma, Dengyong Zhou, Chao Liu, Michael R Lyu, and Irwin King 2011. Recommender systems with social regularization. In WSDM '11. 287--296.Google ScholarDigital Library
- Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. CoRR Vol. abs/1301.3781 (2013).Google Scholar
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean 2013. Distributed representations of words and phrases and their compositionality NIPS '13. 3111--3119.Google Scholar
- Jennifer Neville and David Jensen 2005. Leveraging relational autocorrelation with latent group models Proceedings of the 4th international workshop on Multi-relational mining. ACM, 49--55.Google Scholar
- Mingdong Ou, Peng Cui, Jian Pei, Ziwei Zhang, and Wenwu Zhu 2016. Asymmetric Transitivity Preserving Graph Embedding KDD '16. ACM, 1105--1114.Google Scholar
- Siddharth Pal, Yuxiao Dong, Bishal Thapa, Nitesh V Chawla, Ananthram Swami, and Ram Ramanathan. 2016. Deep learning for network analysis: Problems, approaches and challenges Military Communications Conference, MILCOM 2016--2016. IEEE, 588--593.Google ScholarDigital Library
- Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. DeepWalk: Online Learning of Social Representations KDD '14. ACM, 701--710.Google ScholarDigital Library
- Xiang Ren, Wenqi He, Meng Qu, Clare R Voss, Heng Ji, and Jiawei Han. 2016. Label noise reduction in entity typing by heterogeneous partial-label embedding KDD '16. ACM.Google Scholar
- Xin Rong 2014. word2vec Parameter Learning Explained. CoRR Vol. abs/1411.2738 (2014).Google Scholar
- Yizhou Sun and Jiawei Han 2012. Mining Heterogeneous Information Networks: Principles and Methodologies. Morgan & Claypool Publishers.Google Scholar
- Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, and Tianyi Wu 2011. Pathsim: Meta path-based top-k similarity search in heterogeneous information networks VLDB '11. 992--1003.Google Scholar
- Yizhou Sun, Brandon Norick, Jiawei Han, Xifeng Yan, Philip S. Yu, and Xiao Yu. 2012. Integrating Meta-path Selection with User-guided Object Clustering in Heterogeneous Information Networks. In KDD '12. ACM, 1348--1356. Google ScholarDigital Library
- Yizhou Sun, Yintao Yu, and Jiawei Han 2009. Ranking-based Clustering of Heterogeneous Information Networks with Star Network Schema KDD '09. ACM, 797--806.Google Scholar
- Jian Tang, Meng Qu, and Qiaozhu Mei 2015. PTE: Predictive Text Embedding Through Large-scale Heterogeneous Text Networks KDD '15. ACM, 1165--1174.Google ScholarDigital Library
- Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. LINE: Large-scale Information Network Embedding.. WWW '15. ACM. Google ScholarDigital Library
- Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su 2008. ArnetMiner: Extraction and Mining of Academic Social Networks KDD '08. 990--998. Google ScholarDigital Library
- Lei Tang and Huan Liu. 2009. Relational learning via latent social dimensions. KDD '09. 817--826. Google ScholarDigital Library
- Lei Tang and Huan Liu. 2011. Leveraging social media networks for classification. DMKD, Vol. 23, 3 (2011), 447--478. Google ScholarDigital Library
- Shuicheng Yan, Dong Xu, Benyu Zhang, Hong-Jiang Zhang, Qiang Yang, and Stephen Lin. 2007. Graph embedding and extensions: A general framework for dimensionality reduction. IEEE TPAMI, Vol. 29, 1 (2007). Google ScholarCross Ref
- Jing Zhang, Jie Tang, Cong Ma, Hanghang Tong, Yu Jing, and Juanzi Li. 2015. Panther: Fast top-k similarity search on large networks KDD '15. ACM, 1445--1454.Google ScholarDigital Library
Index Terms
- metapath2vec: Scalable Representation Learning for Heterogeneous Networks
Recommendations
MARU: Meta-context Aware Random Walks for Heterogeneous Network Representation Learning
CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge ManagementInformation networks, such as social and citation networks, are ubiquitous in the real world so that network analysis plays an important role in data mining and knowledge discovery. To alleviate the sparsity problem of network analysis, it is common to ...
Heterogeneous Information Network Embedding with Meta-path Based Graph Attention Networks
Artificial Neural Networks and Machine Learning – ICANN 2019: Workshop and Special SessionsAbstractNetwork embedding is an emerging research field which aims at projecting network elements into lower dimensional spaces. However, most network embedding algorithms focus on homogeneous networks, thus cannot be directly applied to the Heterogeneous ...
Collaborative Knowledge Distillation for Heterogeneous Information Network Embedding
WWW '22: Proceedings of the ACM Web Conference 2022Learning low-dimensional representations for Heterogeneous Information Networks (HINs) has drawn increasing attention recently for its effectiveness in real-world applications. Compared with homogeneous networks, HINs are characterized by meta-paths ...
Comments