ABSTRACT
Graph representation learning has emerged as a powerful technique for addressing real-world problems. Various downstream graph learning tasks have benefited from its recent developments, such as node classification, similarity search, and graph classification. However, prior arts on graph representation learning focus on domain specific problems and train a dedicated model for each graph dataset, which is usually non-transferable to out-of-domain data. Inspired by the recent advances in pre-training from natural language processing and computer vision, we design Graph Contrastive Coding (GCC) --- a self-supervised graph neural network pre-training framework --- to capture the universal network topological properties across multiple networks. We design GCC's pre-training task as subgraph instance discrimination in and across networks and leverage contrastive learning to empower graph neural networks to learn the intrinsic and transferable structural representations. We conduct extensive experiments on three graph learning tasks and ten graph datasets. The results show that GCC pre-trained on a collection of diverse datasets can achieve competitive or better performance to its task-specific and trained-from-scratch counterparts. This suggests that the pre-training and fine-tuning paradigm presents great potential for graph representation learning.
- Réka Albert and Albert-László Barabási. 2002. Statistical mechanics of complex networks. Reviews of modern physics, Vol. 74, 1 (2002), 47.Google Scholar
- J Ignacio Alvarez-Hamelin, Luca Dall'Asta, Alain Barrat, and Alessandro Vespignani. 2006. Large scale networks fingerprinting and visualization using the k-core decomposition. In Advances in neural information processing systems. 41--50.Google Scholar
- Lars Backstrom, Dan Huttenlocher, Jon Kleinberg, and Xiangyang Lan. 2006. Group formation in large social networks: membership, growth, and evolution. In KDD '06 . 44--54.Google ScholarDigital Library
- Peter W Battaglia, Jessica B Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinicius Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, et almbox. 2018. Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261 (2018).Google Scholar
- Austin R Benson, David F Gleich, and Jure Leskovec. 2016. Higher-order organization of complex networks. Science , Vol. 353, 6295 (2016), 163--166.Google Scholar
- Stephen P Borgatti and Martin G Everett. 2000. Models of core/periphery structures. Social networks, Vol. 21, 4 (2000), 375--395.Google Scholar
- Ronald S Burt. 2009. Structural holes: The social structure of competition .Harvard university press.Google Scholar
- Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM transactions on intelligent systems and technology (TIST) , Vol. 2, 3 (2011), 1--27.Google ScholarDigital Library
- Kevin Clark, Minh-Thang Luong, Quoc V Le, and Christopher D Manning. 2019. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators. In ICLR '19 .Google Scholar
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT '19. 4171--4186.Google Scholar
- Yuxiao Dong, Nitesh V Chawla, and Ananthram Swami. 2017. metapath2vec: Scalable representation learning for heterogeneous networks. In KDD '17 . 135--144.Google ScholarDigital Library
- Claire Donnat, Marinka Zitnik, David Hallac, and Jure Leskovec. 2018. Learning structural node embeddings via diffusion wavelets. In KDD '18 . 1320--1329.Google ScholarDigital Library
- Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. 2017. Neural message passing for quantum chemistry. In ICML '17. JMLR. org, 1263--1272.Google Scholar
- Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In KDD '16. 855--864.Google ScholarDigital Library
- Raia Hadsell, Sumit Chopra, and Yann LeCun. 2006. Dimensionality reduction by learning an invariant mapping. In CVPR '06, Vol. 2. IEEE, 1735--1742.Google ScholarDigital Library
- Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Advances in neural information processing systems. 1024--1034.Google Scholar
- Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Momentum contrast for unsupervised visual representation learning. In CVPR '20 . 9729--9738.Google ScholarCross Ref
- Keith Henderson, Brian Gallagher, Tina Eliassi-Rad, Hanghang Tong, Sugato Basu, Leman Akoglu, Danai Koutra, Christos Faloutsos, and Lei Li. 2012. Rolx: structural role extraction & mining in large graphs. In KDD '12. 1231--1239.Google ScholarDigital Library
- Weihua Hu, Bowen Liu, Joseph Gomes, Marinka Zitnik, Percy Liang, Vijay Pande, and Jure Leskovec. 2019 b. Pre-training graph neural networks. In ICLR '19 .Google Scholar
- Ziniu Hu, Changjun Fan, Ting Chen, Kai-Wei Chang, and Yizhou Sun. 2019 a. Unsupervised Pre-Training of Graph Convolutional Networks. ICLR 2019 Workshop: Representation Learning on Graphs and Manifolds (2019).Google Scholar
- Glen Jeh and Jennifer Widom. 2002. SimRank: a measure of structural-context similarity. In KDD '02 . 538--543.Google ScholarDigital Library
- Yilun Jin, Guojie Song, and Chuan Shi. 2019. GraLSP: Graph Neural Networks with Local Structural Patterns. arXiv preprint arXiv:1911.07675 (2019).Google Scholar
- Kristian Kersting, Nils M. Kriege, Christopher Morris, Petra Mutzel, and Marion Neumann. 2016. Benchmark Data Sets for Graph Kernels. http://graphkernels.cs.tu-dortmund.deGoogle Scholar
- Diederik P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. ICLR '15 .Google Scholar
- Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In ICLR '17 .Google Scholar
- Elizabeth A Leicht, Petter Holme, and Mark EJ Newman. 2006. Vertex similarity in networks. Physical Review E , Vol. 73, 2 (2006), 026120.Google ScholarCross Ref
- Jure Leskovec and Christos Faloutsos. 2006. Sampling from large graphs. In KDD '06. 631--636.Google ScholarDigital Library
- Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. 2005. Graphs over time: densification laws, shrinking diameters and possible explanations. In KDD '05 . 177--187.Google ScholarDigital Library
- Silvio Micali and Zeyuan Allen Zhu. 2016. Reconstructing markov processes from independent and anonymous experiments. Discrete Applied Mathematics , Vol. 200 (2016), 108--122.Google ScholarDigital Library
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111--3119.Google Scholar
- Ron Milo, Shalev Itzkovitz, Nadav Kashtan, Reuven Levitt, Shai Shen-Orr, Inbal Ayzenshtat, Michal Sheffer, and Uri Alon. 2004. Superfamilies of evolved and designed networks. Science , Vol. 303, 5663 (2004), 1538--1542.Google Scholar
- Ron Milo, Shai Shen-Orr, Shalev Itzkovitz, Nadav Kashtan, Dmitri Chklovskii, and Uri Alon. 2002. Network motifs: simple building blocks of complex networks. Science , Vol. 298, 5594 (2002), 824--827.Google Scholar
- Annamalai Narayanan, Mahinthan Chandramohan, Rajasekar Venkatesan, Lihui Chen, Yang Liu, and Shantanu Jaiswal. 2017. graph2vec: Learning distributed representations of graphs. arXiv preprint arXiv:1707.05005 (2017).Google Scholar
- Mark EJ Newman. 2006. Modularity and community structure in networks. Proceedings of the national academy of sciences , Vol. 103, 23 (2006), 8577--8582.Google ScholarCross Ref
- Aaron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018).Google Scholar
- Jia-Yu Pan, Hyung-Jeong Yang, Christos Faloutsos, and Pinar Duygulu. 2004. Automatic multimedia cross-modal correlation discovery. In KDD '04 . 653--658.Google ScholarDigital Library
- Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems. 8024--8035.Google ScholarDigital Library
- Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et almbox. 2011. Scikit-learn: Machine learning in Python. Journal of machine learning research , Vol. 12, Oct (2011), 2825--2830.Google ScholarDigital Library
- Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In KDD '14 . 701--710.Google ScholarDigital Library
- Jiezhong Qiu, Yuxiao Dong, Hao Ma, Jian Li, Chi Wang, Kuansan Wang, and Jie Tang. 2019. Netsmf: Large-scale network embedding as sparse matrix factorization. In The World Wide Web Conference. 1509--1520.Google ScholarDigital Library
- Jiezhong Qiu, Yuxiao Dong, Hao Ma, Jian Li, Kuansan Wang, and Jie Tang. 2018a. Network embedding as matrix factorization: Unifying deepwalk, line, pte, and node2vec. In WSDM '18 . 459--467.Google ScholarDigital Library
- Jiezhong Qiu, Jian Tang, Hao Ma, Yuxiao Dong, Kuansan Wang, and Jie Tang. 2018b. Deepinf: Social influence prediction with deep learning. In KDD '18 . 2110--2119.Google ScholarDigital Library
- Leonardo FR Ribeiro, Pedro HP Saverese, and Daniel R Figueiredo. 2017. struc2vec: Learning node representations from structural identity. In KDD '17 . 385--394.Google ScholarDigital Library
- Scott C Ritchie, Stephen Watts, Liam G Fearnley, Kathryn E Holt, Gad Abraham, and Michael Inouye. 2016. A scalable permutation approach reveals replication and preservation patterns of network modules in large datasets. Cell systems , Vol. 3, 1 (2016), 71--82.Google Scholar
- Daniel A Spielman and Shang-Hua Teng. 2013. A local clustering algorithm for massive graphs and its application to nearly linear time graph partitioning. SIAM Journal on computing , Vol. 42, 1 (2013), 1--26.Google Scholar
- Fan-Yun Sun, Jordan Hoffman, Vikas Verma, and Jian Tang. 2019. InfoGraph: Unsupervised and Semi-supervised Graph-Level Representation Learning via Mutual Information Maximization. In ICLR '19 .Google Scholar
- Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. Line: Large-scale information network embedding. In WWW '15. 1067--1077.Google ScholarDigital Library
- Shang-Hua Teng et almbox. 2016. Scalable algorithms for data and network analysis. Foundations and Trends® in Theoretical Computer Science , Vol. 12, 1--2 (2016), 1--274.Google Scholar
- Yonglong Tian, Dilip Krishnan, and Phillip Isola. 2019. Contrastive multiview coding. arXiv preprint arXiv:1906.05849 (2019).Google Scholar
- Hanghang Tong, Christos Faloutsos, and Jia-Yu Pan. 2006. Fast random walk with restart and its applications. In ICDM '06. IEEE, 613--622.Google ScholarDigital Library
- Johan Ugander, Lars Backstrom, Cameron Marlow, and Jon Kleinberg. 2012. Structural diversity in social contagion. Proceedings of the National Academy of Sciences , Vol. 109, 16 (2012), 5962--5966.Google ScholarCross Ref
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008.Google Scholar
- Petar Velivc ković , Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2018. Graph attention networks. ICLR '18 (2018).Google Scholar
- Ulrike Von Luxburg. 2007. A tutorial on spectral clustering. Statistics and computing , Vol. 17, 4 (2007), 395--416.Google Scholar
- Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2019 a. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In ICLR '19 .Google Scholar
- Minjie Wang, Lingfan Yu, Da Zheng, Quan Gan, Yu Gai, Zihao Ye, Mufei Li, Jinjing Zhou, Qi Huang, Chao Ma, et almbox. 2019 b. Deep graph library: Towards efficient and scalable deep learning on graphs. arXiv preprint arXiv:1909.01315 (2019).Google Scholar
- Duncan J Watts and Steven H Strogatz. 1998. Collective dynamics of small-world networks. nature , Vol. 393, 6684 (1998), 440.Google Scholar
- Zhirong Wu, Yuanjun Xiong, Stella X Yu, and Dahua Lin. 2018. Unsupervised feature learning via non-parametric instance discrimination. In CVPR '18 . 3733--3742.Google ScholarCross Ref
- Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. 2019. How Powerful are Graph Neural Networks?. In ICLR '19 .Google Scholar
- Pinar Yanardag and SVN Vishwanathan. 2015. Deep graph kernels. In KDD '15. 1365--1374.Google ScholarDigital Library
- Jaewon Yang and Jure Leskovec. 2015. Defining and evaluating network communities based on ground-truth. Knowledge and Information Systems , Vol. 42, 1 (2015), 181--213.Google ScholarDigital Library
- Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L Hamilton, and Jure Leskovec. 2018. Graph convolutional neural networks for web-scale recommender systems. In KDD '18 . 974--983.Google ScholarDigital Library
- Fanjin Zhang, Xiao Liu, Jie Tang, Yuxiao Dong, Peiran Yao, Jie Zhang, Xiaotao Gu, Yan Wang, Bin Shao, Rui Li, and et al. 2019 b. OAG: Toward Linking Large-Scale Heterogeneous Entity Graphs. In KDD '19 . 2585--2595.Google Scholar
- Jie Zhang, Yuxiao Dong, Yan Wang, Jie Tang, and Ming Ding. 2019 a. ProNE: fast and scalable network representation learning. In IJCAI '19 . 4278--4284.Google ScholarCross Ref
- Jing Zhang, Jie Tang, Cong Ma, Hanghang Tong, Yu Jing, and Juanzi Li. 2015. Panther: Fast top-k similarity search on large networks. In KDD '15 . 1445--1454.Google ScholarDigital Library
- Muhan Zhang, Zhicheng Cui, Marion Neumann, and Yixin Chen. 2018. An end-to-end deep learning architecture for graph classification. In AAAI '18 .Google ScholarCross Ref
Index Terms
- GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training
Recommendations
Multi-scale Graph Pooling Approach with Adaptive Key Subgraph for Graph Representations
CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge ManagementThe recent progress in graph representation learning boosts the development of many graph classification tasks, such as protein classification and social network classification. One of the mainstream approaches for graph representation learning is the ...
Self-supervised contrastive graph representation with node and graph augmentation
AbstractGraph representation is a critical technology in the field of knowledge engineering and knowledge-based applications since most knowledge bases are represented in the graph structure. Nowadays, contrastive learning has become a prominent way for ...
SMGCL: Semi-supervised Multi-view Graph Contrastive Learning
AbstractGraph contrastive learning (GCL), aiming to generate supervision information by transforming the graph data itself, is increasingly becoming a focus of graph research. It has shown promising performance in graph representation learning ...
Comments