Abstract
Large networks are becoming a widely used abstraction for studying complex systems in a broad set of disciplines, ranging from social-network analysis to molecular biology and neuroscience. Despite an increasing need to analyze and manipulate large networks, only a limited number of tools are available for this task.
Here, we describe the Stanford Network Analysis Platform (SNAP), a general-purpose, high-performance system that provides easy-to-use, high-level operations for analysis and manipulation of large networks. We present SNAP functionality, describe its implementational details, and give performance benchmarks. SNAP has been developed for single big-memory machines, and it balances the trade-off between maximum performance, compact in-memory graph representation, and the ability to handle dynamic graphs in which nodes and edges are being added or removed over time. SNAP can process massive networks with hundreds of millions of nodes and billions of edges. SNAP offers over 140 different graph algorithms that can efficiently manipulate large graphs, calculate structural properties, generate regular and random graphs, and handle attributes and metadata on nodes and edges. Besides being able to handle large graphs, an additional strength of SNAP is that networks and their attributes are fully dynamic; they can be modified during the computation at low cost. SNAP is provided as an open-source library in C++ as well as a module in Python.
We also describe the Stanford Large Network Dataset, a set of social and information real-world networks and datasets, which we make publicly available. The collection is a complementary resource to our SNAP software and is widely used for development and benchmarking of graph analytics algorithms.
- S. Arifuzzaman, M. Khan, and M. Marathe. 2013. PATRIC: A parallel algorithm for counting triangles in massive networks. In ACM International Conference on Information and Knowledge Management (CIKM). 529--538. Google ScholarDigital Library
- A.-L. Barabási and R. Albert. 1999. Emergence of scaling in random networks. Science 286, 5439 (1999), 509--512.Google Scholar
- V. Batagelj and A. Mrvar. 1998. Pajek-program for large network analysis. Connections 21, 2 (1998), 47--57.Google Scholar
- V. Batagelj and M. Zaveršnik. 2002. Generalized cores. ArXiv cs.DS/0202039 (Feb 2002).Google Scholar
- A. A. Benczur, K. Csalogany, T. Sarlos, and M. Uher. 2005. Spamrank--fully automatic link spam detection. In International Workshop on Adversarial Information Retrieval on the Web.Google Scholar
- B. Bollobás. 1980. A probabilistic proof of an asymptotic formula for the number of labelled regular graphs. European Journal of Combinatorics 1, 4 (1980), 311--316.Google ScholarCross Ref
- D. Chakrabarti, Y. Zhan, and C. Faloutsos. 2004. R-MAT: A recursive model for graph mining. In SIAM International Conference on Data Mining (SDM), Vol. 4. SIAM, 442--446.Google Scholar
- G. Csardi and T. Nepusz. 2006. The igraph software package for complex network research. InterJournal, Complex Systems 1695, 5 (2006).Google Scholar
- D. Easley and J. Kleinberg. 2010. Networks, Crowds, and Markets: Reasoning About a Highly Connected World. Cambridge University Press. Google ScholarDigital Library
- A. D. Flaxman, A. M. Frieze, and J. Vera. 2006. A geometric preferential attachment model of networks. Internet Mathematics 3, 2 (2006), 187--205.Google ScholarCross Ref
- M. Gomez-Rodriguez, J. Leskovec, D. Balduzzi, and B. Schölkopf. 2014. Uncovering the structure and temporal dynamics of information propagation. Network Science 2, 01 (2014), 26--65.Google ScholarCross Ref
- M. Gomez-Rodriguez, J. Leskovec, and A. Krause. 2010. Inferring networks of diffusion and influence. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). ACM, 1019--1028. Google ScholarDigital Library
- M. Gomez-Rodriguez, J. Leskovec, and A. Krause. 2012. Inferring networks of diffusion and influence. ACM Transactions on Knowledge Discovery from Data 5, 4, Article 21 (Feb. 2012), 37 pages. Google ScholarDigital Library
- M. Gomez-Rodriguez, J. Leskovec, and B. Schölkopf. 2013. Structure and dynamics of information pathways in online media. In ACM International Conference on Web Search and Data Mining (WSDM). ACM, 23--32. Google ScholarDigital Library
- J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. 2012. PowerGraph: Distributed graph-parallel computation on natural graphs. In USENIX Symposium on Operating Systems Design and Implementation (OSDI), Vol. 12. 2. Google ScholarDigital Library
- D. Gregor and A. Lumsdaine. 2005. The parallel BGL: A generic library for distributed graph computations. Parallel Object-Oriented Scientific Computing (POOSC) 2 (2005), 1--18.Google Scholar
- A. Hagberg, P. Swart, and D. S. Chult. 2008. Exploring Network Structure, Dynamics, and Function using NetworkX. Technical Report. Los Alamos National Laboratory (LANL).Google Scholar
- D. Hallac, J. Leskovec, and S. Boyd. 2015. Network lasso: Clustering and optimization in large graphs. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). ACM, 387--396. Google ScholarDigital Library
- M. O. Jackson. 2008. Social and Economic Networks. Vol. 3. Princeton university press Princeton. Google Scholar
- U. Kang, C. E. Tsourakakis, and C. Faloutsos. 2009. Pegasus: A peta-scale graph mining system implementation and observations. In IEEE International Conference on Data Mining (ICDM). IEEE, 229--238. Google ScholarDigital Library
- J. Kim, W.-S. Han, S. Lee, K. Park, and H. Yu. 2014. OPT: A new framework for overlapped and parallel triangulation in large-scale graphs. In ACM SIGMOD International Conference on Management of Data (SIGMOD). ACM, 637--648. Google ScholarDigital Library
- M. Kim and J. Leskovec. 2011a. Modeling social networks with node attributes using the multiplicative attribute graph model. In Conference on Uncertainty in Artificial Intelligence (UAI).Google Scholar
- M. Kim and J. Leskovec. 2011b. The network completion problem: Inferring missing nodes and edges in networks. In SIAM International Conference on Data Mining (SDM). 47--58.Google Scholar
- M. Kim and J. Leskovec. 2012a. Latent multi-group membership graph model. In International Conference on Machine Learning (ICML).Google Scholar
- M. Kim and J. Leskovec. 2012b. Multiplicative attribute graph model of real-world networks. Internet Mathematics 8, 1--2 (2012), 113--160.Google ScholarCross Ref
- M. Kim and J. Leskovec. 2013. Nonparametric multi-group membership model for dynamic networks. In Advances in Neural Information Processing Systems (NIPS). 1385--1393.Google Scholar
- R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, and E. Upfal. 2000. Stochastic models for the web graph. In Annual Symposium on Foundations of Computer Science. IEEE, 57--65. Google ScholarDigital Library
- H. Kwak, C. Lee, H. Park, and S. Moon. 2010. What is twitter, a social network or a news media?. In WWW’10. Google ScholarDigital Library
- A. Kyrola, G. Blelloch, and C. Guestrin. 2012. GraphChi: Large-scale graph computation on just a PC. In USENIX Symposium on Operating Systems Design and Implementation (OSDI). 31--46. Google ScholarDigital Library
- J. Leskovec, L. Backstrom, and J. Kleinberg. 2009. Meme-tracking and the dynamics of the news cycle. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 497--506. Google ScholarDigital Library
- J. Leskovec, D. Chakrabarti, J. Kleinberg, C. Faloutsos, and Z. Ghahramani. 2010. Kronecker graphs: An approach to modeling networks. Journal of Machine Learning Research 11 (2010), 985--1042. Google ScholarDigital Library
- J. Leskovec and E. Horvitz. 2014. Geospatial structure of a planetary-scale social network. IEEE Transactions on Computational Social Systems 1, 3 (2014), 156--163.Google ScholarCross Ref
- J. Leskovec, J. Kleinberg, and C. Faloutsos. 2005. Graphs over time: Densification laws, shrinking diameters and possible explanations. In ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (KDD). ACM, 177--187. Google ScholarDigital Library
- J. Leskovec and A. Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data. (June 2014).Google Scholar
- P. Lofgren, S. Banerjee, and A. Goel. 2016. Personalized PageRank estimation and search: A bidirectional approach. In ACM International Conference on Web Search and Data Mining (WSDM). ACM. Google ScholarDigital Library
- P. A. Lofgren, S. Banerjee, A. Goel, and C Seshadhri. 2014. Fast-ppr: Scaling personalized pagerank estimation for large graphs. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). ACM, 1436--1445. Google ScholarDigital Library
- G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. 2010. Pregel: A system for large-scale graph processing. In ACM SIGMOD International Conference on Management of Data (SIGMOD). ACM, 135--146. Google ScholarDigital Library
- J. McAuley and J. Leskovec. 2012. Learning to discover social circles in ego networks. In Advances in Neural Information Processing Systems (NIPS).Google Scholar
- J. McAuley and J. Leskovec. 2014. Discovering social circles in ego networks. ACM Transactions on Knowledge Discovery from Data 8, 1, Article 4 (Feb. 2014), 28 pages. Google ScholarDigital Library
- R. Milo, N. Kashtan, S. Itzkovitz, M. E. J. Newman, and U. Alon. 2003. On the uniform generation of random graphs with prescribed degree sequences. arXiv preprint cond-mat/0312028 (2003).Google Scholar
- M. Newman. 2003. The structure and function of complex networks. SIAM Rev. 45, 2 (2003), 167--256.Google ScholarDigital Library
- M. Newman. 2010. Networks: An Introduction. OUP Oxford. Google ScholarDigital Library
- J. O’Madadhain, D. Fisher, P. Smyth, S. White, and Y. Boey. 2005. Analysis and visualization of network data using JUNG. Journal of Statistical Software 10, 2 (2005), 1--35.Google Scholar
- L. Page, S. Brin, R. Motwani, and T. Winograd. November 1999. The Pagerank Citation Ranking: Bringing Order to the Web. Technical Report. Stanford InfoLab.Google Scholar
- Y. Perez, R. Sosič, A. Banerjee, R. Puttagunta, M. Raison, P. Shah, and J. Leskovec. 2015. Ringo: Interactive graph analytics on big-memory machines. In ACM SIGMOD International Conference on Management of Data (SIGMOD). 1105--1110. Google ScholarDigital Library
- E. Ravasz and A.-L. Barabási. 2003. Hierarchical organization in complex networks. Physical Review E 67, 2 (2003), 026112.Google ScholarCross Ref
- S. Salihoglu and J. Widom. 2013. GPS: A graph processing system. In International Conference on Scientific and Statistical Database Management. ACM, 22. Google ScholarDigital Library
- C. Suen, S. Huang, C. Eksombatchai, R. Sosič, and J. Leskovec. 2013. NIFTY: A system for large scale information flow tracking and clustering. In International Conference on World Wide Web (WWW). International World Wide Web Conferences Steering Committee, 1237--1248. Google ScholarDigital Library
- D. J. Watts and S. H. Strogatz. 1998. Collective dynamics of small-world networks. Nature 393, 6684 (1998), 440--442.Google Scholar
- R. S. Xin, J. E. Gonzalez, M. J. Franklin, and I. Stoica. 2013. GraphX: A resilient distributed graph system on Spark. In ACM International Workshop on Graph Data Management Experiences and Systems. ACM, 2. Google ScholarDigital Library
- J. Yang and J. Leskovec. 2012. Community-affiliation graph model for overlapping network community detection. In IEEE International Conference on Data Mining (ICDM). IEEE, 1170--1175. Google ScholarDigital Library
- J. Yang and J. Leskovec. 2013. Overlapping community detection at scale: A nonnegative matrix factorization approach. In ACM International Conference on Web Search and Data Mining (WSDM). ACM, 587--596. Google ScholarDigital Library
- J. Yang and J. Leskovec. 2014. Overlapping communities explain core-periphery organization of networks. Proc. IEEE 102, 12 (Dec 2014), 1892--1902.Google ScholarCross Ref
- J. Yang, J. McAuley, and J. Leskovec. 2013. Community detection in networks with node attributes. In IEEE International Conference on Data Mining (ICDM).Google Scholar
- J. Yang, J. McAuley, and J. Leskovec. 2014. Detecting cohesive and 2-mode communities in directed and undirected networks. In ACM International Conference on Web Search and Data Mining (WSDM). Google ScholarDigital Library
Index Terms
- SNAP: A General-Purpose Network Analysis and Graph-Mining Library
Recommendations
Large Scale Network Analytics with SNAP: Tutorial at the World Wide Web 2015 Conference
WWW '15 Companion: Proceedings of the 24th International Conference on World Wide WebMany techniques for the modeling, analysis and optimization of Web related datasets are based on studies of large scale networks, where a network can contain hundreds of millions of nodes and billions of edges. Network analysis tools must provide not ...
Vulnerability of super edge-connected networks
When the underlying topology of an interconnection network is modeled by a connected graph G, the connectivity of G is an important measurement for reliability and fault tolerance of the network. For a given integer h>=0, a subset F of edges in a ...
Ringo: Interactive Graph Analytics on Big-Memory Machines
SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of DataWe present Ringo, a system for analysis of large graphs. Graphs provide a way to represent and analyze systems of interacting objects (people, proteins, webpages) with edges between the objects denoting interactions (friendships, physical interactions, ...
Comments