research-article

Scaling attributed network embedding to massive graphs

Authors:
Renchi Yang

Nanyang Technological University

Nanyang Technological University
View Profile

,
Jieming Shi

Hong Kong Polytechnic University

Hong Kong Polytechnic University
View Profile

,
Xiaokui Xiao

National University of Singapore

National University of Singapore
View Profile

,
Yin Yang

Hamad bin Khalifa University

Hamad bin Khalifa University
View Profile

,
Juncheng Liu

National University of Singapore

National University of Singapore
View Profile

,
Sourav S. Bhowmick

Nanyang Technological University

Nanyang Technological University
View Profile

Proceedings of the VLDB Endowment Volume 14 Issue 1pp 37–49https://doi.org/10.14778/3421424.3421430

Published:01 September 2020Publication History

Proceedings of the VLDB Endowment

Abstract

Given a graph G where each node is associated with a set of attributes, attributed network embedding (ANE) maps each node v ∈ G to a compact vector X_v, which can be used in downstream machine learning tasks. Ideally, X_v should capture node v's affinity to each attribute, which considers not only v's own attribute associations, but also those of its connected nodes along edges in G. It is challenging to obtain high-utility embeddings that enable accurate predictions; scaling effective ANE computation to massive graphs with millions of nodes pushes the difficulty of the problem to a whole new level. Existing solutions largely fail on such graphs, leading to prohibitive costs, low-quality embeddings, or both.

This paper proposes PANE, an effective and scalable approach to ANE computation for massive graphs that achieves state-of-the-art result quality on multiple benchmark datasets, measured by the accuracy of three common prediction tasks: attribute inference, link prediction, and node classification. In particular, for the large MAG data with over 59 million nodes, 0.98 billion edges, and 2000 attributes, PANE is the only known viable solution that obtains effective embeddings on a single server, within 12 hours.

PANE obtains high scalability and effectiveness through three main algorithmic designs. First, it formulates the learning objective based on a novel random walk model for attributed networks. The resulting optimization task is still challenging on large graphs. Second, PANE includes a highly efficient solver for the above optimization problem, whose key module is a carefully designed initialization of the embeddings, which drastically reduces the number of iterations required to converge. Finally, PANE utilizes multi-core CPUs through non-trivial parallelization of the above solver, which achieves scalability while retaining the high quality of the resulting embeddings. Extensive experiments, comparing 10 existing approaches on 8 real datasets, demonstrate that PANE consistently outperforms all existing methods in terms of result quality, while being orders of magnitude faster.

References

Sambaran Bandyopadhyay, Saley Vishal Vivek, and MN Murty. 2020. Outlier Resistant Unsupervised Deep Architectures for Attributed Network Embedding. In WSDM. 25--33. Google ScholarDigital Library
Léon Bottou. 2010. Large-scale machine learning with stochastic gradient descent. In COMPSTAT. 177--186.Google Scholar
Yukuo Cen, Xu Zou, Jianwei Zhang, Hongxia Yang, Jingren Zhou, and Jie Tang. 2019. Representation learning for attributed multiplex heterogeneous network. In SIGKDD. 1358--1368. Google ScholarDigital Library
Kenneth Ward Church and Patrick Hanks. 1990. Word association norms, mutual information, and lexicography. Computational linguistics (1990), 22--29. Google ScholarDigital Library
Pierre Comon, Xavier Luciani, and André LF De Almeida. 2009. Tensor decompositions, alternating least squares and other tales. J. Chemom. (2009), 393--405.Google Scholar
Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine learning 20, 3 (1995), 273--297. Google ScholarDigital Library
AP Dempster, NM Laird, and DB Rubin. 1977. Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society. Series B (Methodological) 39, 1 (1977), 1--38.Google ScholarCross Ref
Hongchang Gao and Heng Huang. 2018. Deep Attributed Network Embedding.. In IJCAI. Google ScholarDigital Library
Hongchang Gao, Jian Pei, and Heng Huang. 2019. ProGAN: Network Embedding via Proximity Generative Adversarial Network. In KDD. Google ScholarDigital Library
Gene H Golub and Christian Reinsch. 1971. Singular value decomposition and least squares solutions. Linear Algebra (1971), 134--151.Google Scholar
Gene H Golub and Charles F Van Loan. 1996. Matrix computations. 1996. Johns Hopkins University, Press, Baltimore, MD, USA (1996).Google Scholar
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep learning. MIT press. Google ScholarDigital Library
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In NeurIPS. 2672--2680. Google ScholarDigital Library
Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable Feature Learning for Networks. In KDD. Google ScholarDigital Library
Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In NeurIPS. Google ScholarDigital Library
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation (1997), 1735--1780.Google Scholar
Yifan Hou, Hongzhi Chen, Changji Li, James Cheng, and Ming-Chang Yang. 2019. A Representation Learning Framework for Property Graphs. In KDD. Google ScholarDigital Library
Xiao Huang, Jundong Li, and Xia Hu. 2017. Accelerated attributed network embedding. In SDM.Google Scholar
Glen Jeh and Jennifer Widom. 2003. Scaling personalized web search. In WWW. 271--279. Google ScholarDigital Library
Di Jin, Bingyi Li, Pengfei Jiao, Dongxiao He, and Weixiong Zhang. 2019. Network-Specific Variational Auto-Encoder for Embedding in Attribute Networks. In IJCAI. Google ScholarCross Ref
Kaggle. 2012. KDD Cup. https://www.kaggle.com/c/kddcup2012-track1.Google Scholar
Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. ICLR (2016).Google Scholar
Adam Lerer, Ledell Wu, Jiajun Shen, Timothee Lacroix, Luca Wehrstedt, Abhijit Bose, and Alex Peysakhovich. 2019. PyTorch-BigGraph: A Large-scale Graph Embedding System. In SysML.Google Scholar
Jure Leskovec and Julian J Mcauley. 2012. Learning to discover social circles in ego networks. In NeurIPS. 539--547. Google ScholarDigital Library
Jie Liu, Zhicheng He, Lai Wei, and Yalou Huang. 2018. Content to node: Self-translation network embedding. In KDD. Google ScholarDigital Library
Jianxin Ma, Peng Cui, Xiao Wang, and Wenwu Zhu. 2018. Hierarchical taxonomy aware network embedding. In KDD. Google ScholarDigital Library
Zaiqiao Meng, Shangsong Liang, Hongyan Bao, and Xiangliang Zhang. 2019. Co-embedding Attributed Networks. In WSDM. Google ScholarDigital Library
Zaiqiao Meng, Shangsong Liang, Xiangliang Zhang, Richard McCreadie, and Iadh Ounis. 2020. Jointly Learning Representations of Nodes and Attributes for Attributed Networks. TOIS (2020), 1--32. Google ScholarDigital Library
Cameron Musco and Christopher Musco. 2015. Randomized block krylov methods for stronger and faster approximate singular value decomposition. In NeurIPS. Google ScholarDigital Library
Shirui Pan, Ruiqi Hu, Guodong Long, Jing Jiang, Lina Yao, and Chengqi Zhang. 2018. Adversarially Regularized Graph Autoencoder for Graph Embedding.. In IJCAI. Google ScholarDigital Library
Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In SIGKDD. 701--710. Google ScholarDigital Library
Gerard Salton and Michael J McGill. 1986. Introduction to Modern Information Retrieval. Google ScholarDigital Library
Nasrullah Sheikh, Zekarias T Kefato, and Alberto Montresor. 2019. A Simple Approach to Attributed Graph Embedding via Enhanced Autoencoder. In Complex Networks. Springer, 797--809.Google Scholar
Arnab Sinha, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-June Hsu, and Kuansan Wang. 2015. An overview of microsoft academic service (mas) and applications. In WWW. 243--246. Google ScholarDigital Library
Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. Line: Large-scale information network embedding. In WWW. 1067--1077. Google ScholarDigital Library
Hanghang Tong, Christos Faloutsos, and Jia-Yu Pan. 2006. Fast random walk with restart and its applications. In ICDM. IEEE, 613--622. Google ScholarDigital Library
Petar Veličković, William Fedus, William L. Hamilton, Pietro Liò, Yoshua Bengio, and R Devon Hjelm. 2019. Deep Graph Infomax. In ICLR.Google Scholar
Stephen J Wright. 2015. Coordinate descent algorithms. Mathematical Programming (2015), 3--34. Google ScholarDigital Library
Jun Wu and Jingrui He. 2019. Scalable manifold-regularized attributed network embedding via maximum mean discrepancy. In CIKM. 2101--2104. Google ScholarDigital Library
Wei Wu, Bin Li, Ling Chen, and Chengqi Zhang. 2018. Efficient Attributed Network Embedding via Recursive Randomized Hashing.. In IJCAI. Google ScholarDigital Library
Cheng Yang, Zhiyuan Liu, Deli Zhao, Maosong Sun, and Edward Chang. 2015. Network representation learning with rich text information. In IJCAI. Google ScholarDigital Library
Carl Yang, Lin Zhong, Li-Jia Li, and Luo Jie. 2017. Bi-directional joint inference for user links and attributes on large social graphs. In WWW. Google ScholarDigital Library
Hong Yang, Shirui Pan, Ling Chen, Chuan Zhou, and Peng Zhang. 2019. Low-Bit Quantization for Attributed Network Representation Learning. In IJCAI. Google ScholarDigital Library
Hong Yang, Shirui Pan, Peng Zhang, Ling Chen, Defu Lian, and Chengqi Zhang. 2018. Binarized attributed network embedding. In ICDM.Google Scholar
Jaewon Yang, Julian McAuley, and Jure Leskovec. 2013. Community detection in networks with node attributes. In ICDM.Google Scholar
Renchi Yang, Jieming Shi, Xiaokui Xiao, Yin Yang, and Sourav S Bhowmick. 2020. Homogeneous Network Embedding for Massive Graphs via Reweighted Personalized PageRank. In PVLDB. 670--683.Google Scholar
Renchi Yang, Jieming Shi, Xiaokui Xiao, Yin Yang, Juncheng Liu, and Sourav S. Bhowmick. 2020. Scaling Attributed Network Embedding to Massive Graphs. arXiv preprint (2020).Google Scholar
Daokun Zhang, Jie Yin, Xingquan Zhu, and Chengqi Zhang. 2016. Homophily, structure, and content augmented network representation learning. In ICDM.Google Scholar
Zhen Zhang, Hongxia Yang, Jiajun Bu, Sheng Zhou, Pinggang Yu, Jianwei Zhang, Martin Ester, and Can Wang. 2018. ANRL: Attributed Network Representation Learning via Deep Neural Networks.. In IJCAI. Google ScholarDigital Library
Chang Zhou, Yuqiong Liu, Xiaofei Liu, Zhongyi Liu, and Jun Gao. 2017. Scalable graph embedding for asymmetric proximity. In AAAI. Google ScholarDigital Library
Sheng Zhou, Hongxia Yang, Xin Wang, Jiajun Bu, Martin Ester, Pinggang Yu, Jianwei Zhang, and Can Wang. 2018. PRRE: Personalized Relation Ranking Embedding for Attributed Networks. In CIKM. Google ScholarDigital Library
Rong Zhu, Kun Zhao, Hongxia Yang, Wei Lin, Chang Zhou, Baole Ai, Yong Li, and Jingren Zhou. 2019. AliGraph: a comprehensive graph neural network platform. PVLDB (2019), 2094--2105. Google ScholarDigital Library

Recommendations

Discrete embedding for attributed graphs
Abstract
Attributed graphs refer to graphs where both node links and node attributes are observable for analysis. Attributed graph embedding enables joint representation learning of node links and node attributes. Different from classical graph embedding ...
Read More
Deep attributed network embedding
IJCAI'18: Proceedings of the 27th International Joint Conference on Artificial Intelligence

Network embedding has attracted a surge of attention in recent years. It is to learn the low-dimensional representation for nodes in a network, which benefits downstream tasks such as node classification and link prediction. Most of the existing ...
Read More
Gaussian Embedding of Large-Scale Attributed Graphs
Databases Theory and Applications
Abstract
Graph embedding methods transform high-dimensional and complex graph contents into low-dimensional representations. They are useful for a wide range of graph analysis tasks including link prediction, node classification, recommendation and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the VLDB Endowment Volume 14, Issue 1
September 2020
73 pages
ISSN:2150-8097
Editors:
Xin Luna Dong
Amazon
,
Felix Naumann
HPI, University of Potsdam
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 September 2020
Published in pvldb Volume 14, Issue 1
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 10
  Total Citations
  View Citations
- 213
  Total Downloads
- Downloads (Last 12 months)60
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Scaling attributed network embedding to massive graphs

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Recommendations

Discrete embedding for attributed graphs

Deep attributed network embedding

Gaussian Embedding of Large-Scale Attributed Graphs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Scaling attributed network embedding to massive graphs

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Recommendations

Discrete embedding for attributed graphs

Deep attributed network embedding

Gaussian Embedding of Large-Scale Attributed Graphs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media