skip to main content
research-article
Public Access

SNAP: A General-Purpose Network Analysis and Graph-Mining Library

Published:15 July 2016Publication History
Skip Abstract Section

Abstract

Large networks are becoming a widely used abstraction for studying complex systems in a broad set of disciplines, ranging from social-network analysis to molecular biology and neuroscience. Despite an increasing need to analyze and manipulate large networks, only a limited number of tools are available for this task.

Here, we describe the Stanford Network Analysis Platform (SNAP), a general-purpose, high-performance system that provides easy-to-use, high-level operations for analysis and manipulation of large networks. We present SNAP functionality, describe its implementational details, and give performance benchmarks. SNAP has been developed for single big-memory machines, and it balances the trade-off between maximum performance, compact in-memory graph representation, and the ability to handle dynamic graphs in which nodes and edges are being added or removed over time. SNAP can process massive networks with hundreds of millions of nodes and billions of edges. SNAP offers over 140 different graph algorithms that can efficiently manipulate large graphs, calculate structural properties, generate regular and random graphs, and handle attributes and metadata on nodes and edges. Besides being able to handle large graphs, an additional strength of SNAP is that networks and their attributes are fully dynamic; they can be modified during the computation at low cost. SNAP is provided as an open-source library in C++ as well as a module in Python.

We also describe the Stanford Large Network Dataset, a set of social and information real-world networks and datasets, which we make publicly available. The collection is a complementary resource to our SNAP software and is widely used for development and benchmarking of graph analytics algorithms.

References

  1. S. Arifuzzaman, M. Khan, and M. Marathe. 2013. PATRIC: A parallel algorithm for counting triangles in massive networks. In ACM International Conference on Information and Knowledge Management (CIKM). 529--538. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A.-L. Barabási and R. Albert. 1999. Emergence of scaling in random networks. Science 286, 5439 (1999), 509--512.Google ScholarGoogle Scholar
  3. V. Batagelj and A. Mrvar. 1998. Pajek-program for large network analysis. Connections 21, 2 (1998), 47--57.Google ScholarGoogle Scholar
  4. V. Batagelj and M. Zaveršnik. 2002. Generalized cores. ArXiv cs.DS/0202039 (Feb 2002).Google ScholarGoogle Scholar
  5. A. A. Benczur, K. Csalogany, T. Sarlos, and M. Uher. 2005. Spamrank--fully automatic link spam detection. In International Workshop on Adversarial Information Retrieval on the Web.Google ScholarGoogle Scholar
  6. B. Bollobás. 1980. A probabilistic proof of an asymptotic formula for the number of labelled regular graphs. European Journal of Combinatorics 1, 4 (1980), 311--316.Google ScholarGoogle ScholarCross RefCross Ref
  7. D. Chakrabarti, Y. Zhan, and C. Faloutsos. 2004. R-MAT: A recursive model for graph mining. In SIAM International Conference on Data Mining (SDM), Vol. 4. SIAM, 442--446.Google ScholarGoogle Scholar
  8. G. Csardi and T. Nepusz. 2006. The igraph software package for complex network research. InterJournal, Complex Systems 1695, 5 (2006).Google ScholarGoogle Scholar
  9. D. Easley and J. Kleinberg. 2010. Networks, Crowds, and Markets: Reasoning About a Highly Connected World. Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. D. Flaxman, A. M. Frieze, and J. Vera. 2006. A geometric preferential attachment model of networks. Internet Mathematics 3, 2 (2006), 187--205.Google ScholarGoogle ScholarCross RefCross Ref
  11. M. Gomez-Rodriguez, J. Leskovec, D. Balduzzi, and B. Schölkopf. 2014. Uncovering the structure and temporal dynamics of information propagation. Network Science 2, 01 (2014), 26--65.Google ScholarGoogle ScholarCross RefCross Ref
  12. M. Gomez-Rodriguez, J. Leskovec, and A. Krause. 2010. Inferring networks of diffusion and influence. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). ACM, 1019--1028. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Gomez-Rodriguez, J. Leskovec, and A. Krause. 2012. Inferring networks of diffusion and influence. ACM Transactions on Knowledge Discovery from Data 5, 4, Article 21 (Feb. 2012), 37 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Gomez-Rodriguez, J. Leskovec, and B. Schölkopf. 2013. Structure and dynamics of information pathways in online media. In ACM International Conference on Web Search and Data Mining (WSDM). ACM, 23--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. 2012. PowerGraph: Distributed graph-parallel computation on natural graphs. In USENIX Symposium on Operating Systems Design and Implementation (OSDI), Vol. 12. 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. Gregor and A. Lumsdaine. 2005. The parallel BGL: A generic library for distributed graph computations. Parallel Object-Oriented Scientific Computing (POOSC) 2 (2005), 1--18.Google ScholarGoogle Scholar
  17. A. Hagberg, P. Swart, and D. S. Chult. 2008. Exploring Network Structure, Dynamics, and Function using NetworkX. Technical Report. Los Alamos National Laboratory (LANL).Google ScholarGoogle Scholar
  18. D. Hallac, J. Leskovec, and S. Boyd. 2015. Network lasso: Clustering and optimization in large graphs. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). ACM, 387--396. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. O. Jackson. 2008. Social and Economic Networks. Vol. 3. Princeton university press Princeton. Google ScholarGoogle Scholar
  20. U. Kang, C. E. Tsourakakis, and C. Faloutsos. 2009. Pegasus: A peta-scale graph mining system implementation and observations. In IEEE International Conference on Data Mining (ICDM). IEEE, 229--238. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. Kim, W.-S. Han, S. Lee, K. Park, and H. Yu. 2014. OPT: A new framework for overlapped and parallel triangulation in large-scale graphs. In ACM SIGMOD International Conference on Management of Data (SIGMOD). ACM, 637--648. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Kim and J. Leskovec. 2011a. Modeling social networks with node attributes using the multiplicative attribute graph model. In Conference on Uncertainty in Artificial Intelligence (UAI).Google ScholarGoogle Scholar
  23. M. Kim and J. Leskovec. 2011b. The network completion problem: Inferring missing nodes and edges in networks. In SIAM International Conference on Data Mining (SDM). 47--58.Google ScholarGoogle Scholar
  24. M. Kim and J. Leskovec. 2012a. Latent multi-group membership graph model. In International Conference on Machine Learning (ICML).Google ScholarGoogle Scholar
  25. M. Kim and J. Leskovec. 2012b. Multiplicative attribute graph model of real-world networks. Internet Mathematics 8, 1--2 (2012), 113--160.Google ScholarGoogle ScholarCross RefCross Ref
  26. M. Kim and J. Leskovec. 2013. Nonparametric multi-group membership model for dynamic networks. In Advances in Neural Information Processing Systems (NIPS). 1385--1393.Google ScholarGoogle Scholar
  27. R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, and E. Upfal. 2000. Stochastic models for the web graph. In Annual Symposium on Foundations of Computer Science. IEEE, 57--65. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. H. Kwak, C. Lee, H. Park, and S. Moon. 2010. What is twitter, a social network or a news media?. In WWW’10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. A. Kyrola, G. Blelloch, and C. Guestrin. 2012. GraphChi: Large-scale graph computation on just a PC. In USENIX Symposium on Operating Systems Design and Implementation (OSDI). 31--46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. J. Leskovec, L. Backstrom, and J. Kleinberg. 2009. Meme-tracking and the dynamics of the news cycle. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 497--506. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. J. Leskovec, D. Chakrabarti, J. Kleinberg, C. Faloutsos, and Z. Ghahramani. 2010. Kronecker graphs: An approach to modeling networks. Journal of Machine Learning Research 11 (2010), 985--1042. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. J. Leskovec and E. Horvitz. 2014. Geospatial structure of a planetary-scale social network. IEEE Transactions on Computational Social Systems 1, 3 (2014), 156--163.Google ScholarGoogle ScholarCross RefCross Ref
  33. J. Leskovec, J. Kleinberg, and C. Faloutsos. 2005. Graphs over time: Densification laws, shrinking diameters and possible explanations. In ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (KDD). ACM, 177--187. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. J. Leskovec and A. Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data. (June 2014).Google ScholarGoogle Scholar
  35. P. Lofgren, S. Banerjee, and A. Goel. 2016. Personalized PageRank estimation and search: A bidirectional approach. In ACM International Conference on Web Search and Data Mining (WSDM). ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. P. A. Lofgren, S. Banerjee, A. Goel, and C Seshadhri. 2014. Fast-ppr: Scaling personalized pagerank estimation for large graphs. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). ACM, 1436--1445. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. 2010. Pregel: A system for large-scale graph processing. In ACM SIGMOD International Conference on Management of Data (SIGMOD). ACM, 135--146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. J. McAuley and J. Leskovec. 2012. Learning to discover social circles in ego networks. In Advances in Neural Information Processing Systems (NIPS).Google ScholarGoogle Scholar
  39. J. McAuley and J. Leskovec. 2014. Discovering social circles in ego networks. ACM Transactions on Knowledge Discovery from Data 8, 1, Article 4 (Feb. 2014), 28 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. R. Milo, N. Kashtan, S. Itzkovitz, M. E. J. Newman, and U. Alon. 2003. On the uniform generation of random graphs with prescribed degree sequences. arXiv preprint cond-mat/0312028 (2003).Google ScholarGoogle Scholar
  41. M. Newman. 2003. The structure and function of complex networks. SIAM Rev. 45, 2 (2003), 167--256.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. M. Newman. 2010. Networks: An Introduction. OUP Oxford. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. J. O’Madadhain, D. Fisher, P. Smyth, S. White, and Y. Boey. 2005. Analysis and visualization of network data using JUNG. Journal of Statistical Software 10, 2 (2005), 1--35.Google ScholarGoogle Scholar
  44. L. Page, S. Brin, R. Motwani, and T. Winograd. November 1999. The Pagerank Citation Ranking: Bringing Order to the Web. Technical Report. Stanford InfoLab.Google ScholarGoogle Scholar
  45. Y. Perez, R. Sosič, A. Banerjee, R. Puttagunta, M. Raison, P. Shah, and J. Leskovec. 2015. Ringo: Interactive graph analytics on big-memory machines. In ACM SIGMOD International Conference on Management of Data (SIGMOD). 1105--1110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. E. Ravasz and A.-L. Barabási. 2003. Hierarchical organization in complex networks. Physical Review E 67, 2 (2003), 026112.Google ScholarGoogle ScholarCross RefCross Ref
  47. S. Salihoglu and J. Widom. 2013. GPS: A graph processing system. In International Conference on Scientific and Statistical Database Management. ACM, 22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. C. Suen, S. Huang, C. Eksombatchai, R. Sosič, and J. Leskovec. 2013. NIFTY: A system for large scale information flow tracking and clustering. In International Conference on World Wide Web (WWW). International World Wide Web Conferences Steering Committee, 1237--1248. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. D. J. Watts and S. H. Strogatz. 1998. Collective dynamics of small-world networks. Nature 393, 6684 (1998), 440--442.Google ScholarGoogle Scholar
  50. R. S. Xin, J. E. Gonzalez, M. J. Franklin, and I. Stoica. 2013. GraphX: A resilient distributed graph system on Spark. In ACM International Workshop on Graph Data Management Experiences and Systems. ACM,  2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. J. Yang and J. Leskovec. 2012. Community-affiliation graph model for overlapping network community detection. In IEEE International Conference on Data Mining (ICDM). IEEE, 1170--1175. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. J. Yang and J. Leskovec. 2013. Overlapping community detection at scale: A nonnegative matrix factorization approach. In ACM International Conference on Web Search and Data Mining (WSDM). ACM, 587--596. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. J. Yang and J. Leskovec. 2014. Overlapping communities explain core-periphery organization of networks. Proc. IEEE 102, 12 (Dec 2014), 1892--1902.Google ScholarGoogle ScholarCross RefCross Ref
  54. J. Yang, J. McAuley, and J. Leskovec. 2013. Community detection in networks with node attributes. In IEEE International Conference on Data Mining (ICDM).Google ScholarGoogle Scholar
  55. J. Yang, J. McAuley, and J. Leskovec. 2014. Detecting cohesive and 2-mode communities in directed and undirected networks. In ACM International Conference on Web Search and Data Mining (WSDM). Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. SNAP: A General-Purpose Network Analysis and Graph-Mining Library

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Intelligent Systems and Technology
          ACM Transactions on Intelligent Systems and Technology  Volume 8, Issue 1
          January 2017
          363 pages
          ISSN:2157-6904
          EISSN:2157-6912
          DOI:10.1145/2973184
          • Editor:
          • Yu Zheng
          Issue’s Table of Contents

          Copyright © 2016 ACM

          Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 15 July 2016
          • Received: 1 February 2016
          • Accepted: 1 February 2016
          Published in tist Volume 8, Issue 1

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader