ABSTRACT
In this paper, we present Deep Graph Kernels, a unified framework to learn latent representations of sub-structures for graphs, inspired by latest advancements in language modeling and deep learning. Our framework leverages the dependency information between sub-structures by learning their latent representations. We demonstrate instances of our framework on three popular graph kernels, namely Graphlet kernels, Weisfeiler-Lehman subtree kernels, and Shortest-Path graph kernels. Our experiments on several benchmark datasets show that Deep Graph Kernels achieve significant improvements in classification accuracy over state-of-the-art graph kernels.
Supplemental Material
- A. Andreeva, D. Howorth, S. E. Brenner, T. J. Hubbard, C. Chothia, and A. G. Murzin. Scop database in 2004: refinements integrate structure and sequence family data. Nucleic acids research, 32 (suppl 1): D226--D229, 2004.Google Scholar
- Y. Bengio, R. Ducharme, P. Vincent, and C. Janvin. A neural probabilistic language model. The Journal of Machine Learning Research, 3: 1137--1155, 2003. Google ScholarDigital Library
- K. M. Borgwardt and H.-P. Kriegel. Shortest-path kernels on gras. In Proc. Intl. Conf. Data Mining, pages 74--81, 2005. Google ScholarDigital Library
- K. M. Borgwardt, C. S. Ong, S. Schönauer, S. V. N. Vishwanathan, A. J. Smola, and H.-P. Kriegel. Protein function prediction via gra kernels. In Proceedings of Intelligent Systems in Molecular Biology (ISMB), Detroit, USA, 2005.Google ScholarDigital Library
- C. Chang and C. Lin. LIBSVM: a library for support vector machines, 2001.Google Scholar
- F. Costa and K. De Grave. Fast neighborhood subgra pairwise distance kernel. In Proceedings of the 26th International Conference on Machine Learning, pages 255--262, 2010.Google Scholar
- A. K. Debnath, R. L. Lopez de Compadre, G. Debnath, A. J. Shusterman, and C. Hansch. Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds. correlation with molecular orbital energies and hydroobicity. J Med Chem, 34: 786--797, 1991.Google ScholarCross Ref
- A. Feragen, N. Kasenburg, J. Petersen, M. de Bruijne, and K. Borgwardt. Scalable kernels for gras with continuous attributes. In Advances in Neural Information Processing Systems, pages 216--224, 2013.Google Scholar
- N. K. Fox, S. E. Brenner, and J.-M. Chandonia. Scope: Structural classification of proteins-extended, integrating scop and astral data and classification of new structures. Nucleic acids research, 42 (D1): D304--D309, 2014.Google Scholar
- T. Gärtner, P. Flach, and S. Wrobel. On graph kernels: Hardness results and efficient alternatives. In B. Schölkopf and M. K. Warmuth, editors, Proc. Annual Conf. Computational Learning Theory, pages 129--143. Springer, 2003.Google Scholar
- D. Haussler. Convolution kernels on discrete structures. Technical Report UCS-CRL-99-10, UC Santa Cruz, 1999.Google Scholar
- T. Hofmann, B. Schölkopf, and A. J. Smola. Kernel methods in machine learning. Technical Report 156, Max-Planck-Institut für biologische Kybernetik, 2006. To appear in the Annals of Statistics.Google Scholar
- T. Horvath, T. Gärtner, and S. Wrobel. Cyclic pattern kernels for predictive gra mining. In Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD), pages 158--167, 2004. Google ScholarDigital Library
- H. Kashima, K. Tsuda, and A. Inokuchi. Marginalized kernels between labeled gras. In Proc. Intl. Conf. Machine Learning, pages 321--328, San Francisco, CA, 2003. Morgan Kaufmann.Google Scholar
- T. Kin, T. Kato, and K. Tsuda. Protein classification via kernel matrix completion. In B. Schölkopf, K. Tsuda, and J.-P. Vert, editors, Kernel Methods in Computational Biology, pages 261--274, Cambridge, MA; USA, 2004. MIT Press.Google Scholar
- J. Leskovec, J. Kleinberg, and C. Faloutsos. Gras over time: densification laws, shrinking diameters and possible explanations. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pages 177--187. ACM, 2005. Google ScholarDigital Library
- C. Leslie, E. Eskin, and W. S. Noble. The spectrum kernel: A string kernel for SVM protein classification. In Proceedings of the Pacific Symposium on Biocomputing, pages 564--575, Singapore, 2002. World Scientific Publishing.Google Scholar
- O. Levy and Y. Goldberg. Neural word embedding as implicit matrix factorization. In M. Welling, Z. Ghahramani, C. Cortes, N. Lawrence, and K. Weinberger, editors, Advances in Neural Information Processing Systems 27, pages 2177--2185, 2014.Google Scholar
- B. D. McKay. Nauty user's guide (version 2.4). Computer Science Dept., Australian National University, 2007.Google Scholar
- T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.Google Scholar
- T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean. Distributed representations of words and rases and their compositionality. In C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Weinberger, editors, Advances in Neural Information Processing Systems 26, 2013.Google Scholar
- B. Perozzi, R. Al-Rfou, and S. Skiena. Deepwalk: Online learning of social representations. arXiv preprint arXiv:1403.6652, 2014.Google Scholar
- N. Przulj. Biological network comparison using gralet degree distribution. In 2006 European Conference on Computational Biology (ECCB), September 2006.Google Scholar
- J. Ramon and T. Gärtner. Expressivity versus efficiency of gra kernels. Technical report, First International Workshop on Mining Gras, Trees and Sequences (held with ECML/PKDD'03), 2003.Google Scholar
- R. Řehůřek and P. Sojka. Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pages 45--50, Valletta, Malta, May 2010. ELRA.Google Scholar
- B. Schölkopf and A. J. Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2002.Google Scholar
- N. Shervashidze and K. Borgwardt. Fast subtree kernels on gras. In Neural Information Processing Systems, 2010.Google Scholar
- N. Shervashidze, S. V. N. Vishwanathan, T. Petri, K. Mehlhorn, and K. Borgwardt. Efficient gralet kernels for large gra comparison. In M. Welling and D. van Dyk, editors, Proc. Intl. Conference on Artificial Intelligence and Statistics. Society for Artificial Intelligence and Statistics, 2009.Google Scholar
- A. Shrivastava and P. Li. A new space for comparing gras. In Advances in Social Networks Analysis and Mining (ASONAM), 2014 IEEE/ACM International Conference on, pages 62--71. IEEE, 2014.Google ScholarDigital Library
- A. J. Smola and R. Kondor. Kernels and regularization on gras. In B. Schölkopf and M. K. Warmuth, editors, Proc. Annual Conf. Computational Learning Theory, Lecture Notes in Comput. Sci., pages 144--158, Heidelberg, Germany, 2003. Springer-Verlag.Google ScholarCross Ref
- H. Toivonen, A. Srinivasan, R. D. King, S. Kramer, and C. Helma. Statistical evaluation of the predictive toxicology challenge 2000--2001. Bioinformatics, 19 (10): 1183--1193, July 2003.Google ScholarCross Ref
- S. V. N. Vishwanathan, K. M. Borgwardt, O. Guttman, and A. J. Smola. Kernel extrapolation. Neurocomputing, 69 (7-9): 721--729, 2006. Google ScholarDigital Library
- S. V. N. Vishwanathan, N. N. Schraudol, I. R. Kondor, and K. M. Borgwardt. Gra kernels. Journal of Machine Learning Research, 2010. In press. Google ScholarDigital Library
- N. Wale, I. A. Watson, and G. Karypis. Comparison of descriptor spaces for chemical compound retrieval and classification. Knowledge and Information Systems, 14 (3): 347--375, 2008. Google ScholarDigital Library
Index Terms
- Deep Graph Kernels
Recommendations
node2vec: Scalable Feature Learning for Networks
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data MiningPrediction tasks over nodes and edges in networks require careful effort in engineering features used by learning algorithms. Recent research in the broader field of representation learning has led to significant progress in automating prediction by ...
Heterogeneous Graph Neural Network
KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningRepresentation learning in heterogeneous graphs aims to pursue a meaningful vector representation for each node so as to facilitate downstream applications such as link prediction, personalized recommendation, node classification, etc. This task, ...
GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training
KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningGraph representation learning has emerged as a powerful technique for addressing real-world problems. Various downstream graph learning tasks have benefited from its recent developments, such as node classification, similarity search, and graph ...
Comments