Abstract
Given a graph G where each node is associated with a set of attributes, attributed network embedding (ANE) maps each node v ∈ G to a compact vector Xv, which can be used in downstream machine learning tasks. Ideally, Xv should capture node v's affinity to each attribute, which considers not only v's own attribute associations, but also those of its connected nodes along edges in G. It is challenging to obtain high-utility embeddings that enable accurate predictions; scaling effective ANE computation to massive graphs with millions of nodes pushes the difficulty of the problem to a whole new level. Existing solutions largely fail on such graphs, leading to prohibitive costs, low-quality embeddings, or both.
This paper proposes PANE, an effective and scalable approach to ANE computation for massive graphs that achieves state-of-the-art result quality on multiple benchmark datasets, measured by the accuracy of three common prediction tasks: attribute inference, link prediction, and node classification. In particular, for the large MAG data with over 59 million nodes, 0.98 billion edges, and 2000 attributes, PANE is the only known viable solution that obtains effective embeddings on a single server, within 12 hours.
PANE obtains high scalability and effectiveness through three main algorithmic designs. First, it formulates the learning objective based on a novel random walk model for attributed networks. The resulting optimization task is still challenging on large graphs. Second, PANE includes a highly efficient solver for the above optimization problem, whose key module is a carefully designed initialization of the embeddings, which drastically reduces the number of iterations required to converge. Finally, PANE utilizes multi-core CPUs through non-trivial parallelization of the above solver, which achieves scalability while retaining the high quality of the resulting embeddings. Extensive experiments, comparing 10 existing approaches on 8 real datasets, demonstrate that PANE consistently outperforms all existing methods in terms of result quality, while being orders of magnitude faster.
- Sambaran Bandyopadhyay, Saley Vishal Vivek, and MN Murty. 2020. Outlier Resistant Unsupervised Deep Architectures for Attributed Network Embedding. In WSDM. 25--33. Google ScholarDigital Library
- Léon Bottou. 2010. Large-scale machine learning with stochastic gradient descent. In COMPSTAT. 177--186.Google Scholar
- Yukuo Cen, Xu Zou, Jianwei Zhang, Hongxia Yang, Jingren Zhou, and Jie Tang. 2019. Representation learning for attributed multiplex heterogeneous network. In SIGKDD. 1358--1368. Google ScholarDigital Library
- Kenneth Ward Church and Patrick Hanks. 1990. Word association norms, mutual information, and lexicography. Computational linguistics (1990), 22--29. Google ScholarDigital Library
- Pierre Comon, Xavier Luciani, and André LF De Almeida. 2009. Tensor decompositions, alternating least squares and other tales. J. Chemom. (2009), 393--405.Google Scholar
- Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine learning 20, 3 (1995), 273--297. Google ScholarDigital Library
- AP Dempster, NM Laird, and DB Rubin. 1977. Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society. Series B (Methodological) 39, 1 (1977), 1--38.Google ScholarCross Ref
- Hongchang Gao and Heng Huang. 2018. Deep Attributed Network Embedding.. In IJCAI. Google ScholarDigital Library
- Hongchang Gao, Jian Pei, and Heng Huang. 2019. ProGAN: Network Embedding via Proximity Generative Adversarial Network. In KDD. Google ScholarDigital Library
- Gene H Golub and Christian Reinsch. 1971. Singular value decomposition and least squares solutions. Linear Algebra (1971), 134--151.Google Scholar
- Gene H Golub and Charles F Van Loan. 1996. Matrix computations. 1996. Johns Hopkins University, Press, Baltimore, MD, USA (1996).Google Scholar
- Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep learning. MIT press. Google ScholarDigital Library
- Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In NeurIPS. 2672--2680. Google ScholarDigital Library
- Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable Feature Learning for Networks. In KDD. Google ScholarDigital Library
- Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In NeurIPS. Google ScholarDigital Library
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation (1997), 1735--1780.Google Scholar
- Yifan Hou, Hongzhi Chen, Changji Li, James Cheng, and Ming-Chang Yang. 2019. A Representation Learning Framework for Property Graphs. In KDD. Google ScholarDigital Library
- Xiao Huang, Jundong Li, and Xia Hu. 2017. Accelerated attributed network embedding. In SDM.Google Scholar
- Glen Jeh and Jennifer Widom. 2003. Scaling personalized web search. In WWW. 271--279. Google ScholarDigital Library
- Di Jin, Bingyi Li, Pengfei Jiao, Dongxiao He, and Weixiong Zhang. 2019. Network-Specific Variational Auto-Encoder for Embedding in Attribute Networks. In IJCAI. Google ScholarCross Ref
- Kaggle. 2012. KDD Cup. https://www.kaggle.com/c/kddcup2012-track1.Google Scholar
- Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. ICLR (2016).Google Scholar
- Adam Lerer, Ledell Wu, Jiajun Shen, Timothee Lacroix, Luca Wehrstedt, Abhijit Bose, and Alex Peysakhovich. 2019. PyTorch-BigGraph: A Large-scale Graph Embedding System. In SysML.Google Scholar
- Jure Leskovec and Julian J Mcauley. 2012. Learning to discover social circles in ego networks. In NeurIPS. 539--547. Google ScholarDigital Library
- Jie Liu, Zhicheng He, Lai Wei, and Yalou Huang. 2018. Content to node: Self-translation network embedding. In KDD. Google ScholarDigital Library
- Jianxin Ma, Peng Cui, Xiao Wang, and Wenwu Zhu. 2018. Hierarchical taxonomy aware network embedding. In KDD. Google ScholarDigital Library
- Zaiqiao Meng, Shangsong Liang, Hongyan Bao, and Xiangliang Zhang. 2019. Co-embedding Attributed Networks. In WSDM. Google ScholarDigital Library
- Zaiqiao Meng, Shangsong Liang, Xiangliang Zhang, Richard McCreadie, and Iadh Ounis. 2020. Jointly Learning Representations of Nodes and Attributes for Attributed Networks. TOIS (2020), 1--32. Google ScholarDigital Library
- Cameron Musco and Christopher Musco. 2015. Randomized block krylov methods for stronger and faster approximate singular value decomposition. In NeurIPS. Google ScholarDigital Library
- Shirui Pan, Ruiqi Hu, Guodong Long, Jing Jiang, Lina Yao, and Chengqi Zhang. 2018. Adversarially Regularized Graph Autoencoder for Graph Embedding.. In IJCAI. Google ScholarDigital Library
- Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In SIGKDD. 701--710. Google ScholarDigital Library
- Gerard Salton and Michael J McGill. 1986. Introduction to Modern Information Retrieval. Google ScholarDigital Library
- Nasrullah Sheikh, Zekarias T Kefato, and Alberto Montresor. 2019. A Simple Approach to Attributed Graph Embedding via Enhanced Autoencoder. In Complex Networks. Springer, 797--809.Google Scholar
- Arnab Sinha, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-June Hsu, and Kuansan Wang. 2015. An overview of microsoft academic service (mas) and applications. In WWW. 243--246. Google ScholarDigital Library
- Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. Line: Large-scale information network embedding. In WWW. 1067--1077. Google ScholarDigital Library
- Hanghang Tong, Christos Faloutsos, and Jia-Yu Pan. 2006. Fast random walk with restart and its applications. In ICDM. IEEE, 613--622. Google ScholarDigital Library
- Petar Veličković, William Fedus, William L. Hamilton, Pietro Liò, Yoshua Bengio, and R Devon Hjelm. 2019. Deep Graph Infomax. In ICLR.Google Scholar
- Stephen J Wright. 2015. Coordinate descent algorithms. Mathematical Programming (2015), 3--34. Google ScholarDigital Library
- Jun Wu and Jingrui He. 2019. Scalable manifold-regularized attributed network embedding via maximum mean discrepancy. In CIKM. 2101--2104. Google ScholarDigital Library
- Wei Wu, Bin Li, Ling Chen, and Chengqi Zhang. 2018. Efficient Attributed Network Embedding via Recursive Randomized Hashing.. In IJCAI. Google ScholarDigital Library
- Cheng Yang, Zhiyuan Liu, Deli Zhao, Maosong Sun, and Edward Chang. 2015. Network representation learning with rich text information. In IJCAI. Google ScholarDigital Library
- Carl Yang, Lin Zhong, Li-Jia Li, and Luo Jie. 2017. Bi-directional joint inference for user links and attributes on large social graphs. In WWW. Google ScholarDigital Library
- Hong Yang, Shirui Pan, Ling Chen, Chuan Zhou, and Peng Zhang. 2019. Low-Bit Quantization for Attributed Network Representation Learning. In IJCAI. Google ScholarDigital Library
- Hong Yang, Shirui Pan, Peng Zhang, Ling Chen, Defu Lian, and Chengqi Zhang. 2018. Binarized attributed network embedding. In ICDM.Google Scholar
- Jaewon Yang, Julian McAuley, and Jure Leskovec. 2013. Community detection in networks with node attributes. In ICDM.Google Scholar
- Renchi Yang, Jieming Shi, Xiaokui Xiao, Yin Yang, and Sourav S Bhowmick. 2020. Homogeneous Network Embedding for Massive Graphs via Reweighted Personalized PageRank. In PVLDB. 670--683.Google Scholar
- Renchi Yang, Jieming Shi, Xiaokui Xiao, Yin Yang, Juncheng Liu, and Sourav S. Bhowmick. 2020. Scaling Attributed Network Embedding to Massive Graphs. arXiv preprint (2020).Google Scholar
- Daokun Zhang, Jie Yin, Xingquan Zhu, and Chengqi Zhang. 2016. Homophily, structure, and content augmented network representation learning. In ICDM.Google Scholar
- Zhen Zhang, Hongxia Yang, Jiajun Bu, Sheng Zhou, Pinggang Yu, Jianwei Zhang, Martin Ester, and Can Wang. 2018. ANRL: Attributed Network Representation Learning via Deep Neural Networks.. In IJCAI. Google ScholarDigital Library
- Chang Zhou, Yuqiong Liu, Xiaofei Liu, Zhongyi Liu, and Jun Gao. 2017. Scalable graph embedding for asymmetric proximity. In AAAI. Google ScholarDigital Library
- Sheng Zhou, Hongxia Yang, Xin Wang, Jiajun Bu, Martin Ester, Pinggang Yu, Jianwei Zhang, and Can Wang. 2018. PRRE: Personalized Relation Ranking Embedding for Attributed Networks. In CIKM. Google ScholarDigital Library
- Rong Zhu, Kun Zhao, Hongxia Yang, Wei Lin, Chang Zhou, Baole Ai, Yong Li, and Jingren Zhou. 2019. AliGraph: a comprehensive graph neural network platform. PVLDB (2019), 2094--2105. Google ScholarDigital Library
Recommendations
Discrete embedding for attributed graphs
AbstractAttributed graphs refer to graphs where both node links and node attributes are observable for analysis. Attributed graph embedding enables joint representation learning of node links and node attributes. Different from classical graph embedding ...
Deep attributed network embedding
IJCAI'18: Proceedings of the 27th International Joint Conference on Artificial IntelligenceNetwork embedding has attracted a surge of attention in recent years. It is to learn the low-dimensional representation for nodes in a network, which benefits downstream tasks such as node classification and link prediction. Most of the existing ...
Gaussian Embedding of Large-Scale Attributed Graphs
Databases Theory and ApplicationsAbstractGraph embedding methods transform high-dimensional and complex graph contents into low-dimensional representations. They are useful for a wide range of graph analysis tasks including link prediction, node classification, recommendation and ...
Comments