skip to main content
research-article

Efficiently Estimating Motif Statistics of Large Networks

Published:23 September 2014Publication History
Skip Abstract Section

Abstract

Exploring statistics of locally connected subgraph patterns (also known as network motifs) has helped researchers better understand the structure and function of biological and Online Social Networks (OSNs). Nowadays, the massive size of some critical networks—often stored in already overloaded relational databases—effectively limits the rate at which nodes and edges can be explored, making it a challenge to accurately discover subgraph statistics. In this work, we propose sampling methods to accurately estimate subgraph statistics from as few queried nodes as possible. We present sampling algorithms that efficiently and accurately estimate subgraph properties of massive networks. Our algorithms require no precomputation or complete network topology information. At the same time, we provide theoretical guarantees of convergence. We perform experiments using widely known datasets and show that, for the same accuracy, our algorithms require an order of magnitude less queries (samples) than the current state-of-the-art algorithms.

References

  1. István Albert and Réka Albert. 2004. Conserved network motifs allow protein--protein interaction prediction. Bioinformatics 4863, 13 (2004), 3346--3352. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Mansurul A. Bhuiyan, Mahmudur Rahman, Mahmuda Rahman, and Mohammad Al Hasan. 2012. GUISE: Uniform sampling of graphlets for large graph analysis. In Proceedings of IEEE ICDM 2012. 91--100. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Jin Chen, Wynne Hsu, Mong-Li Lee, and See-Kiong Ng. 2006. NeMoFinder: Dissecting genome-wide protein-protein interactions with meso-scale network motifs. In Proceedings of ACM SIGKDD 2006. 106--115. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. James Cheng, Yiping Ke, Ada Wai-Chee Fu, Jeffrey Xu Yu, and Linhong Zhu. 2011. Finding maximal cliques in massive networks. ACM Transactions on Database Systems 36, 4 (Dec. 2011), 21:1--21:34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Shumo Chu and James Cheng. 2012. Triangle listing in massive networks. ACM Transactions on Knowledge Discovery from Data 6, 4, Article 17 (Dec. 2012), 32 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Hyunwoo Chun, Yong Yeol Ahn, Haewoon Kwak, Sue Moon, Young-ho Eom, and Hawoong Jeong. 2008. Comparison of online social relations in terms of volume vs. interaction: A case study of cyworld. In Proceedings of ACM SIGCOMM Internet Measurement Conference 2008. 57--59. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Minas Gjoka, Maciej Kurant, Carter T. Butts, and Athina Markopoulou. 2011. Practical recommendations on crawling online social networks. IEEE Journal on Selected Areas in Communications 29, 9 (2011), 1872--1892.Google ScholarGoogle ScholarCross RefCross Ref
  8. Minas Gjoka, Emily Smith, and Carter T. Butts. 2013. Estimating clique composition and size distributions from sampled network data. ArXiv e-prints (Aug. 2013).Google ScholarGoogle Scholar
  9. Stephen J. Hardiman and Liran Katzir. 2013. Estimating clustering coefficients and size of social networks via random walk. In Proceedings of WWW 2013. 539--550. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Mohammad Al Hasan and Mohammed J. Zaki. 2009. Output space sampling for graph patterns. In Proceedings of the VLDB Endowment 2009. 730--741. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Shalev Itzkovitz, Reuven Levitt, Nadav Kashtan, Ron Milo, Michael Itzkovitz, and Uri Alon. 2005. Coarse-graining and self-dissimilarity of complex networks. Physica Review E 71 (2005), 016127.Google ScholarGoogle ScholarCross RefCross Ref
  12. Galin L. Jones. 2004. On the Markov chain central limit theorem. Probability Surveys 1 (2004), 299--320.Google ScholarGoogle ScholarCross RefCross Ref
  13. Zahra Razaghi Moghadam Kashani, Hayedeh Ahrabian, Elahe Elahi, Abbas Nowzari-Dalini, Elnaz Saberi Ansari, Sahar Asadi, Shahin Mohammadi, Falk Schreiber, and Ali Masoudi-Nejad. 2009. Kavosh: A new algorithm for finding network motifs. BMC Bioinformatics 10 (2009), 318.Google ScholarGoogle ScholarCross RefCross Ref
  14. Nadav Kashtan, Shalev Itzkovitz, Ron Milo, and Uri Alon. 2004. Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs. Bioinformatics 20, 11 (2004), 1746--1758. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Liran Katzir, Edo Liberty, and Oren Somekh. 2011. Estimating sizes of social networks via biased sampling. In Proceedings of WWW 2011. 597--606. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Jérôme Kunegis, Andreas Lommatzsch, and Christian Bauckhage. 2009. The Slashdot zoo: Mining a social network with negative edges. In Proceedings of WWW 2009. 741--750. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a social network or a news media? In Proceedings of WWW 2010. 591--600. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Chul-Ho Lee, Xin Xu, and Do Young Eun. 2012. Beyond random walk and metropolis-hastings samplers: Why you should not backtrack for unbiased graph sampling. In Proceedings of ACM SIGMETRICS/Performance 2012. 319--330. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jure Leskovec, Kevin J. Lang, Anirban Dasgupta, and Michael W. Mahoney. 2009. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics 6, 1 (2009), 29--123.Google ScholarGoogle ScholarCross RefCross Ref
  20. L. Lovász. 1993. Random walks on graphs: A survey. Combinatorics 2 (1993), 1--46. Issue “Paul Erdös is Eighty.”Google ScholarGoogle Scholar
  21. Brendan D. McKay. 1981. Practical graph isomorphism. Congressus Numerantium 30 (1981), 45--87.Google ScholarGoogle Scholar
  22. Brendan D. McKay. 2009. Nauty User’s Guide, Version 2.4. Technical Report. Department of Computer Science, Australian National University.Google ScholarGoogle Scholar
  23. Sean Meyn and Richard L. Tweedie. 2009. Markov Chains and Stochastic Stability. Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. R. Milo, et al. and Cell Biology. 2002. Network motifs: Simple building blocks of complex networks. Science 298, 5549 (Oct. 2002), 824--827.Google ScholarGoogle ScholarCross RefCross Ref
  25. Alan Mislove, Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, and Bobby Bhattacharjee. 2007. Measurement and analysis of online social networks. In Proceedings of ACM SIGCOMM Internet Measurement Conference 2007. 29--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Michael Molloy and Bruce Reed. 1995. A critical point for random graphs with a given degree sequence. Random Structures and Algorithms 6, 2--3 (1995), 161--179. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Saeed Omidi, Falk Schreiber, and Ali Masoudi-nejad. 2009. MODA: An efficient algorithm for network motif discovery in biological networks. Genes and Genet Systems 84, 5 (2009), 385--395.Google ScholarGoogle ScholarCross RefCross Ref
  28. Bruno Ribeiro and Don Towsley. 2010. Estimating and sampling graphs with multidimensional random walks. In Proceedings of ACM SIGCOMM Internet Measurement Conference 2010. 390--403. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Bruno Ribeiro, Pinghui Wang, Fabricio Murai, and Don Towsley. 2012. Sampling directed graphs with random walks. In Proceedings of IEEE INFOCOM 2012. 1692--1700.Google ScholarGoogle ScholarCross RefCross Ref
  30. Bruno F. Ribeiro and Don Towsley. 2012. On the estimation accuracy of degree distributions from graph sampling. In CDC. 5240--5247.Google ScholarGoogle Scholar
  31. Matthew Richardson, Rakesh Agrawal, and Pedro Domingos. 2003. Trust management for the semantic web. In Proceedings of the 2nd International Semantic Web Conference. 351--368.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Gareth O. Roberts and Jeffrey S. Rosenthal. 2004. General state space markov chains and MCMC algorithms. Probability Surveys 1 (2004), 20--71.Google ScholarGoogle ScholarCross RefCross Ref
  33. Shai S. Shen-Orr, Ron Milo, Shmoolik Mangan, and Uri Alon. 2002. Network motifs in the transcriptional regulation network of Escherichia coli. Nature Genetics 31, 1 (May 2002), 64--68.Google ScholarGoogle ScholarCross RefCross Ref
  34. Lubos Takac and Michal Zabovsky. 2012. Data analysis in public social networks. In International Scientific Conference and International Workshop Present Day Trends of Innovations. 1--6.Google ScholarGoogle Scholar
  35. Johan Ugander, Lars Backstrom, and Jon Kleinberg. 2013. Subgraph frequencies: Mapping the empirical and extremal geography of large graph collections. In Proceedings of WWW 2013. 1307--1318. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Sebastian Wernicke. 2006. Efficient detection of network motifs. IEEE/ACM Transactions on Computational Biology and Bioinformatics 3, 4 (2006), 347--359. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Junzhou Zhao, John C. S. Lui, Don Towsley, Xiaohong Guan, and Yadong Zhou. 2011. Empirical analysis of the evolution of follower network: A case study on Douban. In Proceedings of IEEE INFOCOM NetSciCom 2011. 941--946.Google ScholarGoogle Scholar

Index Terms

  1. Efficiently Estimating Motif Statistics of Large Networks

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Knowledge Discovery from Data
      ACM Transactions on Knowledge Discovery from Data  Volume 9, Issue 2
      November 2014
      193 pages
      ISSN:1556-4681
      EISSN:1556-472X
      DOI:10.1145/2672614
      Issue’s Table of Contents

      Copyright © 2014 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 23 September 2014
      • Accepted: 1 March 2014
      • Revised: 1 January 2014
      • Received: 1 October 2013
      Published in tkdd Volume 9, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader