research-article

node2vec: Scalable Feature Learning for Networks

Authors:
Aditya Grover

Stanford University, Stanford, CA, USA

Stanford University, Stanford, CA, USA
View Profile

,
Jure Leskovec

Stanford University, Stanford, CA, USA

Stanford University, Stanford, CA, USA
View Profile

KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data MiningAugust 2016Pages 855–864https://doi.org/10.1145/2939672.2939754

Published:13 August 2016Publication History

KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Pages 855–864

ABSTRACT

Prediction tasks over nodes and edges in networks require careful effort in engineering features used by learning algorithms. Recent research in the broader field of representation learning has led to significant progress in automating prediction by learning the features themselves. However, present feature learning approaches are not expressive enough to capture the diversity of connectivity patterns observed in networks. Here we propose node2vec, an algorithmic framework for learning continuous feature representations for nodes in networks. In node2vec, we learn a mapping of nodes to a low-dimensional space of features that maximizes the likelihood of preserving network neighborhoods of nodes. We define a flexible notion of a node's network neighborhood and design a biased random walk procedure, which efficiently explores diverse neighborhoods. Our algorithm generalizes prior work which is based on rigid notions of network neighborhoods, and we argue that the added flexibility in exploring neighborhoods is the key to learning richer representations.

We demonstrate the efficacy of node2vec over existing state-of-the-art techniques on multi-label classification and link prediction in several real-world networks from diverse domains. Taken together, our work represents a new way for efficiently learning state-of-the-art task-independent representations in complex networks.

References

L. A. Adamic and E. Adar. Friends and neighbors on the web. Social networks, 25(3):211--230, 2003.Google ScholarCross Ref
L. Backstrom and J. Leskovec. Supervised random walks: predicting and recommending links in social networks. In WSDM, 2011. Google ScholarDigital Library
M. Belkin and P. Niyogi. Laplacian eigenmaps and spectral techniques for embedding and clustering. In NIPS, 2001. Google ScholarDigital Library
Y. Bengio, A. Courville, and P. Vincent. Representation learning: A review and new perspectives. IEEE TPAMI, 35(8):1798--1828, 2013. Google ScholarDigital Library
B.-J. Breitkreutz, C. Stark, T. Reguly, L. Boucher, A. Breitkreutz, M. Livstone, R. Oughtred, D. H. Lackner, J. Bahler, V. Wood, et al. The BioGRID interaction database. Nucleic acids research, 36:D637--D640, 2008.Google Scholar
S. Cao, W. Lu, and Q. Xu. GraRep: Learning Graph Representations with global structural information. In CIKM, 2015. Google ScholarDigital Library
S. Fortunato. Community detection in graphs. Physics Reports, 486(3--5):75 -- 174, 2010.Google Scholar
B. Gallagher and T. Eliassi-Rad. Leveraging label-independent features for classification in sparsely labeled networks: An empirical study. In Lecture Notes in Computer Science: Advances in Social Network Mining and Analysis. Springer, 2009. Google ScholarDigital Library
Z. S. Harris. Word. Distributional Structure, 10(23):146--162, 1954.Google Scholar
K. Henderson, B. Gallagher, T. Eliassi-Rad, H. Tong, S. Basu, L. Akoglu, D. Koutra, C. Faloutsos, and L. Li. RolX: structural role extraction & mining in large graphs. In KDD, 2012. Google ScholarDigital Library
K. Henderson, B. Gallagher, L. Li, L. Akoglu, T. Eliassi-Rad, H. Tong, and C. Faloutsos. It's who you know: graph mining using recursive structural features. In KDD, 2011. Google ScholarDigital Library
P. D. Hoff, A. E. Raftery, and M. S. Handcock. Latent space approaches to social network analysis. J. of the American Statistical Association, 2002.Google Scholar
D. E. Knuth. The Stanford GraphBase: a platform for combinatorial computing, volume 37. Addison-Wesley Reading, 1993. Google Scholar
J. Leskovec and A. Krevl. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data, June 2014.Google Scholar
K. Li, J. Gao, S. Guo, N. Du, X. Li, and A. Zhang. LRBM: A restricted boltzmann machine based approach for representation learning on linked data. In ICDM, 2014. Google ScholarDigital Library
X. Li, N. Du, H. Li, K. Li, J. Gao, and A. Zhang. A deep learning approach to link prediction in dynamic networks. In ICDM, 2014.Google ScholarCross Ref
Y. Li, D. Tarlow, M. Brockschmidt, and R. Zemel. Gated graph sequence neural networks. In ICLR, 2016.Google Scholar
D. Liben-Nowell and J. Kleinberg. The link-prediction problem for social networks. J. of the American society for information science and technology, 58(7):1019--1031, 2007. Google ScholarDigital Library
A. Liberzon, A. Subramanian, R. Pinchback, H. Thorvaldsdóttir, P. Tamayo, and J. P. Mesirov. Molecular signatures database (MSigDB) 3.0. Bioinformatics, 27(12):1739--1740, 2011. Google ScholarDigital Library
M. Mahoney. Large text compression benchmark. www.mattmahoney.net/dc/textdata, 2011.Google Scholar
T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. In ICLR, 2013.Google Scholar
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In NIPS, 2013.Google ScholarDigital Library
J. Pennington, R. Socher, and C. D. Manning. GloVe: Global vectors for word representation. In EMNLP, 2014.Google ScholarCross Ref
B. Perozzi, R. Al-Rfou, and S. Skiena. DeepWalk: Online learning of social representations. In KDD, 2014. Google ScholarDigital Library
P. Radivojac, W. T. Clark, T. R. Oron, A. M. Schnoes, T. Wittkop, A. Sokolov, K. Graim, C. Funk, Verspoor, et al. A large-scale evaluation of computational protein function prediction. Nature methods, 10(3):221--227, 2013.Google ScholarCross Ref
B. Recht, C. Re, S. Wright, and F. Niu. Hogwild!: A lock-free approach to parallelizing stochastic gradient descent. In NIPS, 2011.Google Scholar
S. T. Roweis and L. K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500):2323--2326, 2000.Google ScholarCross Ref
J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei. LINE: Large-scale Information Network Embedding. In WWW, 2015. Google ScholarDigital Library
L. Tang and H. Liu. Leveraging social media networks for classification. Data Mining and Knowledge Discovery, 23(3):447--478, 2011. Google ScholarDigital Library
J. B. Tenenbaum, V. De Silva, and J. C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500):2319--2323, 2000.Google ScholarCross Ref
F. Tian, B. Gao, Q. Cui, E. Chen, and T.-Y. Liu. Learning deep representations for graph clustering. In AAAI, 2014. Google ScholarDigital Library
K. Toutanova, D. Klein, C. D. Manning, and Y. Singer. Feature-rich part-of-speech tagging with a cyclic dependency network. In NAACL, 2003. Google ScholarDigital Library
G. Tsoumakas and I. Katakis. Multi-label classification: An overview. Dept. of Informatics, Aristotle University of Thessaloniki, Greece, 2006.Google Scholar
A. Vazquez, A. Flammini, A. Maritan, and A. Vespignani. Global protein function prediction from protein-protein interaction networks. Nature biotechnology, 21(6):697--700, 2003.Google ScholarCross Ref
S. Yan, D. Xu, B. Zhang, H.-J. Zhang, Q. Yang, and S. Lin. Graph embedding and extensions: a general framework for dimensionality reduction. IEEE TPAMI, 29(1):40--51, 2007. Google ScholarDigital Library
J. Yang and J. Leskovec. Overlapping communities explain core-periphery organization of networks. Proceedings of the IEEE, 102(12):1892--1902, 2014.Google ScholarCross Ref
S.-H. Yang, B. Long, A. Smola, N. Sadagopan, Z. Zheng, and H. Zha. Like like alike: joint friendship and interest propagation in social networks. In WWW, 2011. Google ScholarDigital Library
R. Zafarani and H. Liu. Social computing data repository at ASU, 2009.Google Scholar
S. Zhai and Z. Zhang. Dropout training of matrix factorization and autoencoder for link prediction in sparse graphs. In SDM, 2015.Google ScholarCross Ref

Index Terms

node2vec: Scalable Feature Learning for Networks
1. Computing methodologies
  1. Artificial intelligence
    1. Knowledge representation and reasoning
  2. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
        Dimensionality reduction and manifold learning
    2. Machine learning approaches
      1. Learning latent representations
2. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

A network representation learning method based on topology
Abstract
In the network, nodes with similar neighborhood topology often have similar functions and play similar roles. Accurately identifying the role of nodes is the key to our understanding of the network and then facilitates the completion ...
Read More
Higher-order Network Representation Learning
WWW '18: Companion Proceedings of the The Web Conference 2018

This paper describes a general framework for learning Higher-Order Network Embeddings (HONE) from graph data based on network motifs. The HONE framework is highly expressive and flexible with many interchangeable components. The experimental results ...
Read More
VERSE: Versatile Graph Embeddings from Similarity Measures
WWW '18: Proceedings of the 2018 World Wide Web Conference

Embedding a web-scale information network into a low-dimensional vector space facilitates tasks such as link prediction, classification, and visualization. Past research has addressed the problem of extracting such embeddings by adopting methods from ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
August 2016
2176 pages
ISBN:9781450342322
DOI:10.1145/2939672
General Chairs:
Balaji Krishnapuram
IBM
,
Mohak Shah
Bosch
,
Program Chairs:
Alex Smola
Amazon
,
Charu Aggarwal
IBM
,
Dou Shen
Baidu
,
Rajeev Rastogi
Amazon
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 August 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
feature learning
graph representations
information networks
node embeddings
Qualifiers
- research-article
Conference

Acceptance Rates
KDD '16 Paper Acceptance Rate66of1,115submissions,6%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6,196
  Total Citations
  View Citations
- 20,798
  Total Downloads
- Downloads (Last 12 months)2,694
- Downloads (Last 6 weeks)333
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

node2vec: Scalable Feature Learning for Networks

KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

A network representation learning method based on topology

Higher-order Network Representation Learning

VERSE: Versatile Graph Embeddings from Similarity Measures