Abstract
Exploring statistics of locally connected subgraph patterns (also known as network motifs) has helped researchers better understand the structure and function of biological and Online Social Networks (OSNs). Nowadays, the massive size of some critical networks—often stored in already overloaded relational databases—effectively limits the rate at which nodes and edges can be explored, making it a challenge to accurately discover subgraph statistics. In this work, we propose sampling methods to accurately estimate subgraph statistics from as few queried nodes as possible. We present sampling algorithms that efficiently and accurately estimate subgraph properties of massive networks. Our algorithms require no precomputation or complete network topology information. At the same time, we provide theoretical guarantees of convergence. We perform experiments using widely known datasets and show that, for the same accuracy, our algorithms require an order of magnitude less queries (samples) than the current state-of-the-art algorithms.
- István Albert and Réka Albert. 2004. Conserved network motifs allow protein--protein interaction prediction. Bioinformatics 4863, 13 (2004), 3346--3352. Google ScholarDigital Library
- Mansurul A. Bhuiyan, Mahmudur Rahman, Mahmuda Rahman, and Mohammad Al Hasan. 2012. GUISE: Uniform sampling of graphlets for large graph analysis. In Proceedings of IEEE ICDM 2012. 91--100. Google ScholarDigital Library
- Jin Chen, Wynne Hsu, Mong-Li Lee, and See-Kiong Ng. 2006. NeMoFinder: Dissecting genome-wide protein-protein interactions with meso-scale network motifs. In Proceedings of ACM SIGKDD 2006. 106--115. Google ScholarDigital Library
- James Cheng, Yiping Ke, Ada Wai-Chee Fu, Jeffrey Xu Yu, and Linhong Zhu. 2011. Finding maximal cliques in massive networks. ACM Transactions on Database Systems 36, 4 (Dec. 2011), 21:1--21:34. Google ScholarDigital Library
- Shumo Chu and James Cheng. 2012. Triangle listing in massive networks. ACM Transactions on Knowledge Discovery from Data 6, 4, Article 17 (Dec. 2012), 32 pages. Google ScholarDigital Library
- Hyunwoo Chun, Yong Yeol Ahn, Haewoon Kwak, Sue Moon, Young-ho Eom, and Hawoong Jeong. 2008. Comparison of online social relations in terms of volume vs. interaction: A case study of cyworld. In Proceedings of ACM SIGCOMM Internet Measurement Conference 2008. 57--59. Google ScholarDigital Library
- Minas Gjoka, Maciej Kurant, Carter T. Butts, and Athina Markopoulou. 2011. Practical recommendations on crawling online social networks. IEEE Journal on Selected Areas in Communications 29, 9 (2011), 1872--1892.Google ScholarCross Ref
- Minas Gjoka, Emily Smith, and Carter T. Butts. 2013. Estimating clique composition and size distributions from sampled network data. ArXiv e-prints (Aug. 2013).Google Scholar
- Stephen J. Hardiman and Liran Katzir. 2013. Estimating clustering coefficients and size of social networks via random walk. In Proceedings of WWW 2013. 539--550. Google ScholarDigital Library
- Mohammad Al Hasan and Mohammed J. Zaki. 2009. Output space sampling for graph patterns. In Proceedings of the VLDB Endowment 2009. 730--741. Google ScholarDigital Library
- Shalev Itzkovitz, Reuven Levitt, Nadav Kashtan, Ron Milo, Michael Itzkovitz, and Uri Alon. 2005. Coarse-graining and self-dissimilarity of complex networks. Physica Review E 71 (2005), 016127.Google ScholarCross Ref
- Galin L. Jones. 2004. On the Markov chain central limit theorem. Probability Surveys 1 (2004), 299--320.Google ScholarCross Ref
- Zahra Razaghi Moghadam Kashani, Hayedeh Ahrabian, Elahe Elahi, Abbas Nowzari-Dalini, Elnaz Saberi Ansari, Sahar Asadi, Shahin Mohammadi, Falk Schreiber, and Ali Masoudi-Nejad. 2009. Kavosh: A new algorithm for finding network motifs. BMC Bioinformatics 10 (2009), 318.Google ScholarCross Ref
- Nadav Kashtan, Shalev Itzkovitz, Ron Milo, and Uri Alon. 2004. Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs. Bioinformatics 20, 11 (2004), 1746--1758. Google ScholarDigital Library
- Liran Katzir, Edo Liberty, and Oren Somekh. 2011. Estimating sizes of social networks via biased sampling. In Proceedings of WWW 2011. 597--606. Google ScholarDigital Library
- Jérôme Kunegis, Andreas Lommatzsch, and Christian Bauckhage. 2009. The Slashdot zoo: Mining a social network with negative edges. In Proceedings of WWW 2009. 741--750. Google ScholarDigital Library
- Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a social network or a news media? In Proceedings of WWW 2010. 591--600. Google ScholarDigital Library
- Chul-Ho Lee, Xin Xu, and Do Young Eun. 2012. Beyond random walk and metropolis-hastings samplers: Why you should not backtrack for unbiased graph sampling. In Proceedings of ACM SIGMETRICS/Performance 2012. 319--330. Google ScholarDigital Library
- Jure Leskovec, Kevin J. Lang, Anirban Dasgupta, and Michael W. Mahoney. 2009. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics 6, 1 (2009), 29--123.Google ScholarCross Ref
- L. Lovász. 1993. Random walks on graphs: A survey. Combinatorics 2 (1993), 1--46. Issue “Paul Erdös is Eighty.”Google Scholar
- Brendan D. McKay. 1981. Practical graph isomorphism. Congressus Numerantium 30 (1981), 45--87.Google Scholar
- Brendan D. McKay. 2009. Nauty User’s Guide, Version 2.4. Technical Report. Department of Computer Science, Australian National University.Google Scholar
- Sean Meyn and Richard L. Tweedie. 2009. Markov Chains and Stochastic Stability. Cambridge University Press. Google ScholarDigital Library
- R. Milo, et al. and Cell Biology. 2002. Network motifs: Simple building blocks of complex networks. Science 298, 5549 (Oct. 2002), 824--827.Google ScholarCross Ref
- Alan Mislove, Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, and Bobby Bhattacharjee. 2007. Measurement and analysis of online social networks. In Proceedings of ACM SIGCOMM Internet Measurement Conference 2007. 29--42. Google ScholarDigital Library
- Michael Molloy and Bruce Reed. 1995. A critical point for random graphs with a given degree sequence. Random Structures and Algorithms 6, 2--3 (1995), 161--179. Google ScholarDigital Library
- Saeed Omidi, Falk Schreiber, and Ali Masoudi-nejad. 2009. MODA: An efficient algorithm for network motif discovery in biological networks. Genes and Genet Systems 84, 5 (2009), 385--395.Google ScholarCross Ref
- Bruno Ribeiro and Don Towsley. 2010. Estimating and sampling graphs with multidimensional random walks. In Proceedings of ACM SIGCOMM Internet Measurement Conference 2010. 390--403. Google ScholarDigital Library
- Bruno Ribeiro, Pinghui Wang, Fabricio Murai, and Don Towsley. 2012. Sampling directed graphs with random walks. In Proceedings of IEEE INFOCOM 2012. 1692--1700.Google ScholarCross Ref
- Bruno F. Ribeiro and Don Towsley. 2012. On the estimation accuracy of degree distributions from graph sampling. In CDC. 5240--5247.Google Scholar
- Matthew Richardson, Rakesh Agrawal, and Pedro Domingos. 2003. Trust management for the semantic web. In Proceedings of the 2nd International Semantic Web Conference. 351--368.Google ScholarDigital Library
- Gareth O. Roberts and Jeffrey S. Rosenthal. 2004. General state space markov chains and MCMC algorithms. Probability Surveys 1 (2004), 20--71.Google ScholarCross Ref
- Shai S. Shen-Orr, Ron Milo, Shmoolik Mangan, and Uri Alon. 2002. Network motifs in the transcriptional regulation network of Escherichia coli. Nature Genetics 31, 1 (May 2002), 64--68.Google ScholarCross Ref
- Lubos Takac and Michal Zabovsky. 2012. Data analysis in public social networks. In International Scientific Conference and International Workshop Present Day Trends of Innovations. 1--6.Google Scholar
- Johan Ugander, Lars Backstrom, and Jon Kleinberg. 2013. Subgraph frequencies: Mapping the empirical and extremal geography of large graph collections. In Proceedings of WWW 2013. 1307--1318. Google ScholarDigital Library
- Sebastian Wernicke. 2006. Efficient detection of network motifs. IEEE/ACM Transactions on Computational Biology and Bioinformatics 3, 4 (2006), 347--359. Google ScholarDigital Library
- Junzhou Zhao, John C. S. Lui, Don Towsley, Xiaohong Guan, and Yadong Zhou. 2011. Empirical analysis of the evolution of follower network: A case study on Douban. In Proceedings of IEEE INFOCOM NetSciCom 2011. 941--946.Google Scholar
Index Terms
- Efficiently Estimating Motif Statistics of Large Networks
Recommendations
Unbiased Characterization of Node Pairs over Large Graphs
TKDD Special Issue (SIGKDD'13)Characterizing user pair relationships is important for applications such as friend recommendation and interest targeting in online social networks (OSNs). Due to the large-scale nature of such networks, it is infeasible to enumerate all user pairs and ...
Walking on a graph with a magnifying glass: stratified sampling via weighted random walks
SIGMETRICS '11: Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systemsOur objective is to sample the node set of a large unknown graph via crawling, to accurately estimate a given metric of interest. We design a random walk on an appropriately defined weighted graph that achieves high efficiency by preferentially crawling ...
SAKE: Estimating Katz Centrality Based on Sampling for Large-Scale Social Networks
Katz centrality is a fundamental concept to measure the influence of a vertex in a social network. However, existing approaches to calculating Katz centrality in a large-scale network are unpractical and computationally expensive. In this article, we ...
Comments