skip to main content
10.1145/3394486.3403168acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training

Published:20 August 2020Publication History

ABSTRACT

Graph representation learning has emerged as a powerful technique for addressing real-world problems. Various downstream graph learning tasks have benefited from its recent developments, such as node classification, similarity search, and graph classification. However, prior arts on graph representation learning focus on domain specific problems and train a dedicated model for each graph dataset, which is usually non-transferable to out-of-domain data. Inspired by the recent advances in pre-training from natural language processing and computer vision, we design Graph Contrastive Coding (GCC) --- a self-supervised graph neural network pre-training framework --- to capture the universal network topological properties across multiple networks. We design GCC's pre-training task as subgraph instance discrimination in and across networks and leverage contrastive learning to empower graph neural networks to learn the intrinsic and transferable structural representations. We conduct extensive experiments on three graph learning tasks and ten graph datasets. The results show that GCC pre-trained on a collection of diverse datasets can achieve competitive or better performance to its task-specific and trained-from-scratch counterparts. This suggests that the pre-training and fine-tuning paradigm presents great potential for graph representation learning.

References

  1. Réka Albert and Albert-László Barabási. 2002. Statistical mechanics of complex networks. Reviews of modern physics, Vol. 74, 1 (2002), 47.Google ScholarGoogle Scholar
  2. J Ignacio Alvarez-Hamelin, Luca Dall'Asta, Alain Barrat, and Alessandro Vespignani. 2006. Large scale networks fingerprinting and visualization using the k-core decomposition. In Advances in neural information processing systems. 41--50.Google ScholarGoogle Scholar
  3. Lars Backstrom, Dan Huttenlocher, Jon Kleinberg, and Xiangyang Lan. 2006. Group formation in large social networks: membership, growth, and evolution. In KDD '06 . 44--54.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Peter W Battaglia, Jessica B Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinicius Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, et almbox. 2018. Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261 (2018).Google ScholarGoogle Scholar
  5. Austin R Benson, David F Gleich, and Jure Leskovec. 2016. Higher-order organization of complex networks. Science , Vol. 353, 6295 (2016), 163--166.Google ScholarGoogle Scholar
  6. Stephen P Borgatti and Martin G Everett. 2000. Models of core/periphery structures. Social networks, Vol. 21, 4 (2000), 375--395.Google ScholarGoogle Scholar
  7. Ronald S Burt. 2009. Structural holes: The social structure of competition .Harvard university press.Google ScholarGoogle Scholar
  8. Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM transactions on intelligent systems and technology (TIST) , Vol. 2, 3 (2011), 1--27.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Kevin Clark, Minh-Thang Luong, Quoc V Le, and Christopher D Manning. 2019. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators. In ICLR '19 .Google ScholarGoogle Scholar
  10. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT '19. 4171--4186.Google ScholarGoogle Scholar
  11. Yuxiao Dong, Nitesh V Chawla, and Ananthram Swami. 2017. metapath2vec: Scalable representation learning for heterogeneous networks. In KDD '17 . 135--144.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Claire Donnat, Marinka Zitnik, David Hallac, and Jure Leskovec. 2018. Learning structural node embeddings via diffusion wavelets. In KDD '18 . 1320--1329.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. 2017. Neural message passing for quantum chemistry. In ICML '17. JMLR. org, 1263--1272.Google ScholarGoogle Scholar
  14. Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In KDD '16. 855--864.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Raia Hadsell, Sumit Chopra, and Yann LeCun. 2006. Dimensionality reduction by learning an invariant mapping. In CVPR '06, Vol. 2. IEEE, 1735--1742.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Advances in neural information processing systems. 1024--1034.Google ScholarGoogle Scholar
  17. Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Momentum contrast for unsupervised visual representation learning. In CVPR '20 . 9729--9738.Google ScholarGoogle ScholarCross RefCross Ref
  18. Keith Henderson, Brian Gallagher, Tina Eliassi-Rad, Hanghang Tong, Sugato Basu, Leman Akoglu, Danai Koutra, Christos Faloutsos, and Lei Li. 2012. Rolx: structural role extraction & mining in large graphs. In KDD '12. 1231--1239.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Weihua Hu, Bowen Liu, Joseph Gomes, Marinka Zitnik, Percy Liang, Vijay Pande, and Jure Leskovec. 2019 b. Pre-training graph neural networks. In ICLR '19 .Google ScholarGoogle Scholar
  20. Ziniu Hu, Changjun Fan, Ting Chen, Kai-Wei Chang, and Yizhou Sun. 2019 a. Unsupervised Pre-Training of Graph Convolutional Networks. ICLR 2019 Workshop: Representation Learning on Graphs and Manifolds (2019).Google ScholarGoogle Scholar
  21. Glen Jeh and Jennifer Widom. 2002. SimRank: a measure of structural-context similarity. In KDD '02 . 538--543.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Yilun Jin, Guojie Song, and Chuan Shi. 2019. GraLSP: Graph Neural Networks with Local Structural Patterns. arXiv preprint arXiv:1911.07675 (2019).Google ScholarGoogle Scholar
  23. Kristian Kersting, Nils M. Kriege, Christopher Morris, Petra Mutzel, and Marion Neumann. 2016. Benchmark Data Sets for Graph Kernels. http://graphkernels.cs.tu-dortmund.deGoogle ScholarGoogle Scholar
  24. Diederik P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. ICLR '15 .Google ScholarGoogle Scholar
  25. Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In ICLR '17 .Google ScholarGoogle Scholar
  26. Elizabeth A Leicht, Petter Holme, and Mark EJ Newman. 2006. Vertex similarity in networks. Physical Review E , Vol. 73, 2 (2006), 026120.Google ScholarGoogle ScholarCross RefCross Ref
  27. Jure Leskovec and Christos Faloutsos. 2006. Sampling from large graphs. In KDD '06. 631--636.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. 2005. Graphs over time: densification laws, shrinking diameters and possible explanations. In KDD '05 . 177--187.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Silvio Micali and Zeyuan Allen Zhu. 2016. Reconstructing markov processes from independent and anonymous experiments. Discrete Applied Mathematics , Vol. 200 (2016), 108--122.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111--3119.Google ScholarGoogle Scholar
  31. Ron Milo, Shalev Itzkovitz, Nadav Kashtan, Reuven Levitt, Shai Shen-Orr, Inbal Ayzenshtat, Michal Sheffer, and Uri Alon. 2004. Superfamilies of evolved and designed networks. Science , Vol. 303, 5663 (2004), 1538--1542.Google ScholarGoogle Scholar
  32. Ron Milo, Shai Shen-Orr, Shalev Itzkovitz, Nadav Kashtan, Dmitri Chklovskii, and Uri Alon. 2002. Network motifs: simple building blocks of complex networks. Science , Vol. 298, 5594 (2002), 824--827.Google ScholarGoogle Scholar
  33. Annamalai Narayanan, Mahinthan Chandramohan, Rajasekar Venkatesan, Lihui Chen, Yang Liu, and Shantanu Jaiswal. 2017. graph2vec: Learning distributed representations of graphs. arXiv preprint arXiv:1707.05005 (2017).Google ScholarGoogle Scholar
  34. Mark EJ Newman. 2006. Modularity and community structure in networks. Proceedings of the national academy of sciences , Vol. 103, 23 (2006), 8577--8582.Google ScholarGoogle ScholarCross RefCross Ref
  35. Aaron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018).Google ScholarGoogle Scholar
  36. Jia-Yu Pan, Hyung-Jeong Yang, Christos Faloutsos, and Pinar Duygulu. 2004. Automatic multimedia cross-modal correlation discovery. In KDD '04 . 653--658.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems. 8024--8035.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et almbox. 2011. Scikit-learn: Machine learning in Python. Journal of machine learning research , Vol. 12, Oct (2011), 2825--2830.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In KDD '14 . 701--710.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Jiezhong Qiu, Yuxiao Dong, Hao Ma, Jian Li, Chi Wang, Kuansan Wang, and Jie Tang. 2019. Netsmf: Large-scale network embedding as sparse matrix factorization. In The World Wide Web Conference. 1509--1520.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Jiezhong Qiu, Yuxiao Dong, Hao Ma, Jian Li, Kuansan Wang, and Jie Tang. 2018a. Network embedding as matrix factorization: Unifying deepwalk, line, pte, and node2vec. In WSDM '18 . 459--467.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Jiezhong Qiu, Jian Tang, Hao Ma, Yuxiao Dong, Kuansan Wang, and Jie Tang. 2018b. Deepinf: Social influence prediction with deep learning. In KDD '18 . 2110--2119.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Leonardo FR Ribeiro, Pedro HP Saverese, and Daniel R Figueiredo. 2017. struc2vec: Learning node representations from structural identity. In KDD '17 . 385--394.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Scott C Ritchie, Stephen Watts, Liam G Fearnley, Kathryn E Holt, Gad Abraham, and Michael Inouye. 2016. A scalable permutation approach reveals replication and preservation patterns of network modules in large datasets. Cell systems , Vol. 3, 1 (2016), 71--82.Google ScholarGoogle Scholar
  45. Daniel A Spielman and Shang-Hua Teng. 2013. A local clustering algorithm for massive graphs and its application to nearly linear time graph partitioning. SIAM Journal on computing , Vol. 42, 1 (2013), 1--26.Google ScholarGoogle Scholar
  46. Fan-Yun Sun, Jordan Hoffman, Vikas Verma, and Jian Tang. 2019. InfoGraph: Unsupervised and Semi-supervised Graph-Level Representation Learning via Mutual Information Maximization. In ICLR '19 .Google ScholarGoogle Scholar
  47. Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. Line: Large-scale information network embedding. In WWW '15. 1067--1077.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Shang-Hua Teng et almbox. 2016. Scalable algorithms for data and network analysis. Foundations and Trends® in Theoretical Computer Science , Vol. 12, 1--2 (2016), 1--274.Google ScholarGoogle Scholar
  49. Yonglong Tian, Dilip Krishnan, and Phillip Isola. 2019. Contrastive multiview coding. arXiv preprint arXiv:1906.05849 (2019).Google ScholarGoogle Scholar
  50. Hanghang Tong, Christos Faloutsos, and Jia-Yu Pan. 2006. Fast random walk with restart and its applications. In ICDM '06. IEEE, 613--622.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Johan Ugander, Lars Backstrom, Cameron Marlow, and Jon Kleinberg. 2012. Structural diversity in social contagion. Proceedings of the National Academy of Sciences , Vol. 109, 16 (2012), 5962--5966.Google ScholarGoogle ScholarCross RefCross Ref
  52. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008.Google ScholarGoogle Scholar
  53. Petar Velivc ković , Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2018. Graph attention networks. ICLR '18 (2018).Google ScholarGoogle Scholar
  54. Ulrike Von Luxburg. 2007. A tutorial on spectral clustering. Statistics and computing , Vol. 17, 4 (2007), 395--416.Google ScholarGoogle Scholar
  55. Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2019 a. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In ICLR '19 .Google ScholarGoogle Scholar
  56. Minjie Wang, Lingfan Yu, Da Zheng, Quan Gan, Yu Gai, Zihao Ye, Mufei Li, Jinjing Zhou, Qi Huang, Chao Ma, et almbox. 2019 b. Deep graph library: Towards efficient and scalable deep learning on graphs. arXiv preprint arXiv:1909.01315 (2019).Google ScholarGoogle Scholar
  57. Duncan J Watts and Steven H Strogatz. 1998. Collective dynamics of small-world networks. nature , Vol. 393, 6684 (1998), 440.Google ScholarGoogle Scholar
  58. Zhirong Wu, Yuanjun Xiong, Stella X Yu, and Dahua Lin. 2018. Unsupervised feature learning via non-parametric instance discrimination. In CVPR '18 . 3733--3742.Google ScholarGoogle ScholarCross RefCross Ref
  59. Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. 2019. How Powerful are Graph Neural Networks?. In ICLR '19 .Google ScholarGoogle Scholar
  60. Pinar Yanardag and SVN Vishwanathan. 2015. Deep graph kernels. In KDD '15. 1365--1374.Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Jaewon Yang and Jure Leskovec. 2015. Defining and evaluating network communities based on ground-truth. Knowledge and Information Systems , Vol. 42, 1 (2015), 181--213.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L Hamilton, and Jure Leskovec. 2018. Graph convolutional neural networks for web-scale recommender systems. In KDD '18 . 974--983.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Fanjin Zhang, Xiao Liu, Jie Tang, Yuxiao Dong, Peiran Yao, Jie Zhang, Xiaotao Gu, Yan Wang, Bin Shao, Rui Li, and et al. 2019 b. OAG: Toward Linking Large-Scale Heterogeneous Entity Graphs. In KDD '19 . 2585--2595.Google ScholarGoogle Scholar
  64. Jie Zhang, Yuxiao Dong, Yan Wang, Jie Tang, and Ming Ding. 2019 a. ProNE: fast and scalable network representation learning. In IJCAI '19 . 4278--4284.Google ScholarGoogle ScholarCross RefCross Ref
  65. Jing Zhang, Jie Tang, Cong Ma, Hanghang Tong, Yu Jing, and Juanzi Li. 2015. Panther: Fast top-k similarity search on large networks. In KDD '15 . 1445--1454.Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Muhan Zhang, Zhicheng Cui, Marion Neumann, and Yixin Chen. 2018. An end-to-end deep learning architecture for graph classification. In AAAI '18 .Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
          August 2020
          3664 pages
          ISBN:9781450379984
          DOI:10.1145/3394486

          Copyright © 2020 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 20 August 2020

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate1,133of8,635submissions,13%

          Upcoming Conference

          KDD '24

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader