skip to main content
research-article

Scaling attributed network embedding to massive graphs

Published:01 September 2020Publication History
Skip Abstract Section

Abstract

Given a graph G where each node is associated with a set of attributes, attributed network embedding (ANE) maps each node vG to a compact vector Xv, which can be used in downstream machine learning tasks. Ideally, Xv should capture node v's affinity to each attribute, which considers not only v's own attribute associations, but also those of its connected nodes along edges in G. It is challenging to obtain high-utility embeddings that enable accurate predictions; scaling effective ANE computation to massive graphs with millions of nodes pushes the difficulty of the problem to a whole new level. Existing solutions largely fail on such graphs, leading to prohibitive costs, low-quality embeddings, or both.

This paper proposes PANE, an effective and scalable approach to ANE computation for massive graphs that achieves state-of-the-art result quality on multiple benchmark datasets, measured by the accuracy of three common prediction tasks: attribute inference, link prediction, and node classification. In particular, for the large MAG data with over 59 million nodes, 0.98 billion edges, and 2000 attributes, PANE is the only known viable solution that obtains effective embeddings on a single server, within 12 hours.

PANE obtains high scalability and effectiveness through three main algorithmic designs. First, it formulates the learning objective based on a novel random walk model for attributed networks. The resulting optimization task is still challenging on large graphs. Second, PANE includes a highly efficient solver for the above optimization problem, whose key module is a carefully designed initialization of the embeddings, which drastically reduces the number of iterations required to converge. Finally, PANE utilizes multi-core CPUs through non-trivial parallelization of the above solver, which achieves scalability while retaining the high quality of the resulting embeddings. Extensive experiments, comparing 10 existing approaches on 8 real datasets, demonstrate that PANE consistently outperforms all existing methods in terms of result quality, while being orders of magnitude faster.

References

  1. Sambaran Bandyopadhyay, Saley Vishal Vivek, and MN Murty. 2020. Outlier Resistant Unsupervised Deep Architectures for Attributed Network Embedding. In WSDM. 25--33. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Léon Bottou. 2010. Large-scale machine learning with stochastic gradient descent. In COMPSTAT. 177--186.Google ScholarGoogle Scholar
  3. Yukuo Cen, Xu Zou, Jianwei Zhang, Hongxia Yang, Jingren Zhou, and Jie Tang. 2019. Representation learning for attributed multiplex heterogeneous network. In SIGKDD. 1358--1368. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Kenneth Ward Church and Patrick Hanks. 1990. Word association norms, mutual information, and lexicography. Computational linguistics (1990), 22--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Pierre Comon, Xavier Luciani, and André LF De Almeida. 2009. Tensor decompositions, alternating least squares and other tales. J. Chemom. (2009), 393--405.Google ScholarGoogle Scholar
  6. Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine learning 20, 3 (1995), 273--297. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. AP Dempster, NM Laird, and DB Rubin. 1977. Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society. Series B (Methodological) 39, 1 (1977), 1--38.Google ScholarGoogle ScholarCross RefCross Ref
  8. Hongchang Gao and Heng Huang. 2018. Deep Attributed Network Embedding.. In IJCAI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Hongchang Gao, Jian Pei, and Heng Huang. 2019. ProGAN: Network Embedding via Proximity Generative Adversarial Network. In KDD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Gene H Golub and Christian Reinsch. 1971. Singular value decomposition and least squares solutions. Linear Algebra (1971), 134--151.Google ScholarGoogle Scholar
  11. Gene H Golub and Charles F Van Loan. 1996. Matrix computations. 1996. Johns Hopkins University, Press, Baltimore, MD, USA (1996).Google ScholarGoogle Scholar
  12. Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep learning. MIT press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In NeurIPS. 2672--2680. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable Feature Learning for Networks. In KDD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In NeurIPS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation (1997), 1735--1780.Google ScholarGoogle Scholar
  17. Yifan Hou, Hongzhi Chen, Changji Li, James Cheng, and Ming-Chang Yang. 2019. A Representation Learning Framework for Property Graphs. In KDD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Xiao Huang, Jundong Li, and Xia Hu. 2017. Accelerated attributed network embedding. In SDM.Google ScholarGoogle Scholar
  19. Glen Jeh and Jennifer Widom. 2003. Scaling personalized web search. In WWW. 271--279. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Di Jin, Bingyi Li, Pengfei Jiao, Dongxiao He, and Weixiong Zhang. 2019. Network-Specific Variational Auto-Encoder for Embedding in Attribute Networks. In IJCAI. Google ScholarGoogle ScholarCross RefCross Ref
  21. Kaggle. 2012. KDD Cup. https://www.kaggle.com/c/kddcup2012-track1.Google ScholarGoogle Scholar
  22. Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. ICLR (2016).Google ScholarGoogle Scholar
  23. Adam Lerer, Ledell Wu, Jiajun Shen, Timothee Lacroix, Luca Wehrstedt, Abhijit Bose, and Alex Peysakhovich. 2019. PyTorch-BigGraph: A Large-scale Graph Embedding System. In SysML.Google ScholarGoogle Scholar
  24. Jure Leskovec and Julian J Mcauley. 2012. Learning to discover social circles in ego networks. In NeurIPS. 539--547. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Jie Liu, Zhicheng He, Lai Wei, and Yalou Huang. 2018. Content to node: Self-translation network embedding. In KDD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Jianxin Ma, Peng Cui, Xiao Wang, and Wenwu Zhu. 2018. Hierarchical taxonomy aware network embedding. In KDD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Zaiqiao Meng, Shangsong Liang, Hongyan Bao, and Xiangliang Zhang. 2019. Co-embedding Attributed Networks. In WSDM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Zaiqiao Meng, Shangsong Liang, Xiangliang Zhang, Richard McCreadie, and Iadh Ounis. 2020. Jointly Learning Representations of Nodes and Attributes for Attributed Networks. TOIS (2020), 1--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Cameron Musco and Christopher Musco. 2015. Randomized block krylov methods for stronger and faster approximate singular value decomposition. In NeurIPS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Shirui Pan, Ruiqi Hu, Guodong Long, Jing Jiang, Lina Yao, and Chengqi Zhang. 2018. Adversarially Regularized Graph Autoencoder for Graph Embedding.. In IJCAI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In SIGKDD. 701--710. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Gerard Salton and Michael J McGill. 1986. Introduction to Modern Information Retrieval. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Nasrullah Sheikh, Zekarias T Kefato, and Alberto Montresor. 2019. A Simple Approach to Attributed Graph Embedding via Enhanced Autoencoder. In Complex Networks. Springer, 797--809.Google ScholarGoogle Scholar
  34. Arnab Sinha, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-June Hsu, and Kuansan Wang. 2015. An overview of microsoft academic service (mas) and applications. In WWW. 243--246. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. Line: Large-scale information network embedding. In WWW. 1067--1077. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Hanghang Tong, Christos Faloutsos, and Jia-Yu Pan. 2006. Fast random walk with restart and its applications. In ICDM. IEEE, 613--622. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Petar Veličković, William Fedus, William L. Hamilton, Pietro Liò, Yoshua Bengio, and R Devon Hjelm. 2019. Deep Graph Infomax. In ICLR.Google ScholarGoogle Scholar
  38. Stephen J Wright. 2015. Coordinate descent algorithms. Mathematical Programming (2015), 3--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Jun Wu and Jingrui He. 2019. Scalable manifold-regularized attributed network embedding via maximum mean discrepancy. In CIKM. 2101--2104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Wei Wu, Bin Li, Ling Chen, and Chengqi Zhang. 2018. Efficient Attributed Network Embedding via Recursive Randomized Hashing.. In IJCAI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Cheng Yang, Zhiyuan Liu, Deli Zhao, Maosong Sun, and Edward Chang. 2015. Network representation learning with rich text information. In IJCAI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Carl Yang, Lin Zhong, Li-Jia Li, and Luo Jie. 2017. Bi-directional joint inference for user links and attributes on large social graphs. In WWW. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Hong Yang, Shirui Pan, Ling Chen, Chuan Zhou, and Peng Zhang. 2019. Low-Bit Quantization for Attributed Network Representation Learning. In IJCAI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Hong Yang, Shirui Pan, Peng Zhang, Ling Chen, Defu Lian, and Chengqi Zhang. 2018. Binarized attributed network embedding. In ICDM.Google ScholarGoogle Scholar
  45. Jaewon Yang, Julian McAuley, and Jure Leskovec. 2013. Community detection in networks with node attributes. In ICDM.Google ScholarGoogle Scholar
  46. Renchi Yang, Jieming Shi, Xiaokui Xiao, Yin Yang, and Sourav S Bhowmick. 2020. Homogeneous Network Embedding for Massive Graphs via Reweighted Personalized PageRank. In PVLDB. 670--683.Google ScholarGoogle Scholar
  47. Renchi Yang, Jieming Shi, Xiaokui Xiao, Yin Yang, Juncheng Liu, and Sourav S. Bhowmick. 2020. Scaling Attributed Network Embedding to Massive Graphs. arXiv preprint (2020).Google ScholarGoogle Scholar
  48. Daokun Zhang, Jie Yin, Xingquan Zhu, and Chengqi Zhang. 2016. Homophily, structure, and content augmented network representation learning. In ICDM.Google ScholarGoogle Scholar
  49. Zhen Zhang, Hongxia Yang, Jiajun Bu, Sheng Zhou, Pinggang Yu, Jianwei Zhang, Martin Ester, and Can Wang. 2018. ANRL: Attributed Network Representation Learning via Deep Neural Networks.. In IJCAI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Chang Zhou, Yuqiong Liu, Xiaofei Liu, Zhongyi Liu, and Jun Gao. 2017. Scalable graph embedding for asymmetric proximity. In AAAI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Sheng Zhou, Hongxia Yang, Xin Wang, Jiajun Bu, Martin Ester, Pinggang Yu, Jianwei Zhang, and Can Wang. 2018. PRRE: Personalized Relation Ranking Embedding for Attributed Networks. In CIKM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Rong Zhu, Kun Zhao, Hongxia Yang, Wei Lin, Chang Zhou, Baole Ai, Yong Li, and Jingren Zhou. 2019. AliGraph: a comprehensive graph neural network platform. PVLDB (2019), 2094--2105. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 14, Issue 1
    September 2020
    73 pages
    ISSN:2150-8097
    Issue’s Table of Contents

    Publisher

    VLDB Endowment

    Publication History

    • Published: 1 September 2020
    Published in pvldb Volume 14, Issue 1

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader