Abstract
Machine learning over graphs has been emerging as powerful learning tools for graph data. However, it is challenging for industrial communities to leverage the techniques, such as graph neural networks (GNNs), and solve real-world problems at scale because of inherent data dependency in the graphs. As such, we cannot simply train a GNN with classic learning systems, for instance, parameter server that assumes data parallelism. Existing systems store the graph data in-memory for fast accesses either in a single machine or graph stores from remote. The major drawbacks are three-fold. First, they cannot scale because of the limitations on the volume of the memories, or the bandwidth between graph stores and workers. Second, they require extra development of graph stores without well exploiting mature infrastructures such as MapReduce that guarantee good system properties. Third, they focus on training but ignore optimizing the performance of inference over graphs, thus makes them an unintegrated system.
In this paper, we design AGL, a scalable and integrated system, with fully-functional training and inference for GNNs. Our system design follows the message passing scheme underlying the computations of GNNs. We design to generate the K-hop neighborhood, an information-complete subgraph for each node, as well as do the inference simply by merging values from in-edge neighbors and propagating values to out-edge neighbors via MapReduce. In addition, the K-hop neighborhood contains information-complete subgraphs for each node, thus we simply do the training on parameter servers due to data independence. Our system AGL, implemented on mature infrastructures, can finish the training of a 2-layer GNN on a graph with billions of nodes and hundred billions of edges in 14 hours, and complete the inference in 1.2 hours.
- B. Bollobás and O. Riordan. The diameter of a scale-free random graph. Combinatorica, 24(1):5--34, 2004. Google ScholarDigital Library
- J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun. Spectral networks and locally connected networks on graphs. arXiv preprint arXiv.1312.6203, 2013.Google Scholar
- A. Ching, S. Edunov, M. Kabiljo, D. Logothetis, and S. Muthukrishnan. One trillion edges: Graph processing at facebook-scale. PVLDB, 8(12):1804--1815, 2015. Google ScholarDigital Library
- J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Communications of the ACM, 51(1):107--113, 2008. Google ScholarDigital Library
- M. Fey and J. E. Lenssen. Fast graph representation learning with pytorch geometric. arXiv preprint arXiv:1903.02428, 2019.Google Scholar
- V. Garcia and J. Bruna. Few-shot learning with graph neural networks. arXiv preprint arXiv:1711.04043, 2017.Google Scholar
- T. Hamaguchi, H. Oiwa, M. Shimbo, and Y. Matsumoto. Knowledge transfer for out-of-knowledge-base entities: A graph neural network approach. arXiv preprint arXiv:1706.05674, 2017. Google ScholarDigital Library
- W. Hamilton, Z. Ying, and J. Leskovec. Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems, pages 1024--1034, 2017. Google ScholarDigital Library
- B. Hu, Z. Zhang, C. Shi, J. Zhou, X. Li, and Y. Qi. Cash-out user detection based on attributed heterogeneous information network with a hierarchical attention mechanism. In Proceedings of the AAAI, volume 33, pages 946--953, 2019.Google ScholarCross Ref
- W. Hu, M. Fey, M. Zitnik, Y. Dong, H. Ren, B. Liu, M. Catasta, and J. Leskovec. Open graph benchmark: Datasets for machine learning on graphs. arXiv preprint arXiv:2005.00687, 2020.Google Scholar
- T. N. Kipf and M. Welling. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.Google Scholar
- A. Lerer, L. Wu, J. Shen, T. Lacroix, L. Wehrstedt, A. Bose, and A. Peysakhovich. Pytorch-biggraph: A large-scale graph embedding system. arXiv preprint arXiv:1903.12287, 2019.Google Scholar
- Z. Liu, C. Chen, L. Li, J. Zhou, X. Li, L. Song, and Y. Qi. Geniepath: Graph neural networks with adaptive receptive paths. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 4424--4431, 2019.Google ScholarCross Ref
- Z. Liu, C. Chen, X. Yang, J. Zhou, X. Li, and L. Song. Heterogeneous graph neural networks for malicious account detection. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pages 2077--2085. ACM, 2018. Google ScholarDigital Library
- Z. Liu, D. Wang, Q. Yu, Z. Zhang, Y. Shen, J. Ma, W. Zhong, J. Gu, J. Zhou, S. Yang, et al. Graph representation learning for merchant incentive optimization in mobile payment marketing. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pages 2577--2584, 2019. Google ScholarDigital Library
- K. Marino, R. Salakhutdinov, and A. Gupta. The more you know: Using knowledge graphs for image classification. In 2017 CVPR, pages 20--28. IEEE, 2017.Google ScholarCross Ref
- A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer. Automatic differentiation in pytorch. 2017.Google Scholar
- P. Sen, G. Namata, M. Bilgic, L. Getoor, B. Galligher, and T. Eliassi-Rad. Collective classification in network data. AI magazine, 29(3):93--93, 2008.Google ScholarDigital Library
- P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio. Graph attention networks. arXiv preprint arXiv:1710.10903, 2017.Google Scholar
- M. Wang, L. Yu, D. Zheng, Q. Gan, Y. Gai, Z. Ye, M. Li, J. Zhou, Q. Huang, C. Ma, et al. Deep graph library: Towards efficient and scalable deep learning on graphs. arXiv preprint arXiv:1909.01315, 2019.Google Scholar
- Z. Wang, Q. Lv, X. Lan, and Y. Zhang. Cross-lingual knowledge graph alignment via graph convolutional networks. In EMNLP, pages 349--357, 2018.Google ScholarCross Ref
- J. C. Westland, T. Q. Phan, and T. Tan. Private information, credit risk and graph structure in p2p lending networks. arXiv preprint arXiv:1802.10000, 2018.Google Scholar
- S. Yan, Y. Xiong, and D. Lin. Spatial temporal graph convolutional networks for skeleton-based action recognition. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.Google ScholarCross Ref
- H. Yang. Aligraph: A comprehensive graph neural network platform. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 3165--3166. ACM, 2019. Google ScholarDigital Library
- R. Ying, R. He, K. Chen, P. Eksombatchai, W. L. Hamilton, and J. Leskovec. Graph convolutional neural networks for web-scale recommender systems. In Proceedings of the 24th ACM SIGKDD, pages 974--983. ACM, 2018. Google ScholarDigital Library
- D. Zhang, X. Song, Z. Liu, Z. Zhang, X. Huang, L. Wang, and J. Zhou. Dsslp: A distributed framework for semi-supervised link prediction. In 2019 IEEE International Conference on Big Data (Big Data), pages 1557--1566. IEEE, 2019.Google ScholarCross Ref
- M. Zhang and Y. Chen. Inductive graph pattern learning for recommender systems based on a graph neural network. arXiv preprint arXiv:1904.12058, 2019.Google Scholar
- Y. Zhang, Q. Liu, and L. Song. Sentence-state lstm for text representation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 317--327, 2018.Google ScholarCross Ref
- J. Zhou, X. Li, P. Zhao, C. Chen, L. Li, X. Yang, Q. Cui, J. Yu, X. Chen, Y. Ding, et al. Kunpeng: Parameter server based distributed learning systems and its applications in alibaba and ant financial. In Proceedings of the 23rd ACM SIGKDD, pages 1693--1702, 2017. Google ScholarDigital Library
- M. Zitnik and J. Leskovec. Predicting multicellular function through multi-layer tissue networks. Bioinformatics, 33(14):i190--i198, 2017.Google ScholarCross Ref
Recommendations
On the Multichromatic Number of s-Stable Kneser Graphs
For positive integers n and s, a subset Sï [n] is s-stable if sï |i-j|ï n-s for distinct i,j∈S . The s-stable r-uniform Kneser hypergraph KGrn,ks-stable is the r-uniform hypergraph that has the collection of all s-stable k-element subsets of [n] as ...
Adjacent vertex-distinguishing edge and total chromatic numbers of hypercubes
An adjacent vertex-distinguishing edge coloring of a simple graph G is a proper edge coloring of G such that incident edge sets of any two adjacent vertices are assigned different sets of colors. A total coloring of a graph G is a coloring of both the ...
Forbidden Subgraphs and Weak Locally Connected Graphs
A graph is called H-free if it has no induced subgraph isomorphic to H. A graph is called $$N^i$$Ni-locally connected if $$G[\{ x\in V(G): 1\le d_G(w, x)\le i\}]$$G[{x?V(G):1≤dG(w,x)≤i}] is connected and $$N_2$$N2-locally connected if $$G[\{uv: \{uw, vw\...
Comments