research-article

Efficiently Estimating Motif Statistics of Large Networks

Authors:
Pinghui Wang

Huawei Noah's Ark Lab, Hong Kong

Huawei Noah's Ark Lab, Hong Kong
View Profile

,
John C. S. Lui

The Chinese University of Hong Kong, Shatin, Hong Kong

The Chinese University of Hong Kong, Shatin, Hong Kong
View Profile

,
Bruno Ribeiro

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

,
Don Towsley

University of Massachusetts Amherst, MA, USA

University of Massachusetts Amherst, MA, USA
View Profile

,
Junzhou Zhao

Xi'an Jiaotong University, China

Xi'an Jiaotong University, China
View Profile

,
Xiaohong Guan

Xi'an Jiaotong University, China

Xi'an Jiaotong University, China
View Profile

Authors Info & Claims

ACM Transactions on Knowledge Discovery from Data Volume 9 Issue 2Article No.: 8pp 1–27https://doi.org/10.1145/2629564

Published:23 September 2014Publication History

ACM Transactions on Knowledge Discovery from Data

Abstract

Exploring statistics of locally connected subgraph patterns (also known as network motifs) has helped researchers better understand the structure and function of biological and Online Social Networks (OSNs). Nowadays, the massive size of some critical networks—often stored in already overloaded relational databases—effectively limits the rate at which nodes and edges can be explored, making it a challenge to accurately discover subgraph statistics. In this work, we propose sampling methods to accurately estimate subgraph statistics from as few queried nodes as possible. We present sampling algorithms that efficiently and accurately estimate subgraph properties of massive networks. Our algorithms require no precomputation or complete network topology information. At the same time, we provide theoretical guarantees of convergence. We perform experiments using widely known datasets and show that, for the same accuracy, our algorithms require an order of magnitude less queries (samples) than the current state-of-the-art algorithms.

References

István Albert and Réka Albert. 2004. Conserved network motifs allow protein--protein interaction prediction. Bioinformatics 4863, 13 (2004), 3346--3352. Google ScholarDigital Library
Mansurul A. Bhuiyan, Mahmudur Rahman, Mahmuda Rahman, and Mohammad Al Hasan. 2012. GUISE: Uniform sampling of graphlets for large graph analysis. In Proceedings of IEEE ICDM 2012. 91--100. Google ScholarDigital Library
Jin Chen, Wynne Hsu, Mong-Li Lee, and See-Kiong Ng. 2006. NeMoFinder: Dissecting genome-wide protein-protein interactions with meso-scale network motifs. In Proceedings of ACM SIGKDD 2006. 106--115. Google ScholarDigital Library
James Cheng, Yiping Ke, Ada Wai-Chee Fu, Jeffrey Xu Yu, and Linhong Zhu. 2011. Finding maximal cliques in massive networks. ACM Transactions on Database Systems 36, 4 (Dec. 2011), 21:1--21:34. Google ScholarDigital Library
Shumo Chu and James Cheng. 2012. Triangle listing in massive networks. ACM Transactions on Knowledge Discovery from Data 6, 4, Article 17 (Dec. 2012), 32 pages. Google ScholarDigital Library
Hyunwoo Chun, Yong Yeol Ahn, Haewoon Kwak, Sue Moon, Young-ho Eom, and Hawoong Jeong. 2008. Comparison of online social relations in terms of volume vs. interaction: A case study of cyworld. In Proceedings of ACM SIGCOMM Internet Measurement Conference 2008. 57--59. Google ScholarDigital Library
Minas Gjoka, Maciej Kurant, Carter T. Butts, and Athina Markopoulou. 2011. Practical recommendations on crawling online social networks. IEEE Journal on Selected Areas in Communications 29, 9 (2011), 1872--1892.Google ScholarCross Ref
Minas Gjoka, Emily Smith, and Carter T. Butts. 2013. Estimating clique composition and size distributions from sampled network data. ArXiv e-prints (Aug. 2013).Google Scholar
Stephen J. Hardiman and Liran Katzir. 2013. Estimating clustering coefficients and size of social networks via random walk. In Proceedings of WWW 2013. 539--550. Google ScholarDigital Library
Mohammad Al Hasan and Mohammed J. Zaki. 2009. Output space sampling for graph patterns. In Proceedings of the VLDB Endowment 2009. 730--741. Google ScholarDigital Library
Shalev Itzkovitz, Reuven Levitt, Nadav Kashtan, Ron Milo, Michael Itzkovitz, and Uri Alon. 2005. Coarse-graining and self-dissimilarity of complex networks. Physica Review E 71 (2005), 016127.Google ScholarCross Ref
Galin L. Jones. 2004. On the Markov chain central limit theorem. Probability Surveys 1 (2004), 299--320.Google ScholarCross Ref
Zahra Razaghi Moghadam Kashani, Hayedeh Ahrabian, Elahe Elahi, Abbas Nowzari-Dalini, Elnaz Saberi Ansari, Sahar Asadi, Shahin Mohammadi, Falk Schreiber, and Ali Masoudi-Nejad. 2009. Kavosh: A new algorithm for finding network motifs. BMC Bioinformatics 10 (2009), 318.Google ScholarCross Ref
Nadav Kashtan, Shalev Itzkovitz, Ron Milo, and Uri Alon. 2004. Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs. Bioinformatics 20, 11 (2004), 1746--1758. Google ScholarDigital Library
Liran Katzir, Edo Liberty, and Oren Somekh. 2011. Estimating sizes of social networks via biased sampling. In Proceedings of WWW 2011. 597--606. Google ScholarDigital Library
Jérôme Kunegis, Andreas Lommatzsch, and Christian Bauckhage. 2009. The Slashdot zoo: Mining a social network with negative edges. In Proceedings of WWW 2009. 741--750. Google ScholarDigital Library
Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a social network or a news media&quest; In Proceedings of WWW 2010. 591--600. Google ScholarDigital Library
Chul-Ho Lee, Xin Xu, and Do Young Eun. 2012. Beyond random walk and metropolis-hastings samplers: Why you should not backtrack for unbiased graph sampling. In Proceedings of ACM SIGMETRICS/Performance 2012. 319--330. Google ScholarDigital Library
Jure Leskovec, Kevin J. Lang, Anirban Dasgupta, and Michael W. Mahoney. 2009. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics 6, 1 (2009), 29--123.Google ScholarCross Ref
L. Lovász. 1993. Random walks on graphs: A survey. Combinatorics 2 (1993), 1--46. Issue “Paul Erdös is Eighty.”Google Scholar
Brendan D. McKay. 1981. Practical graph isomorphism. Congressus Numerantium 30 (1981), 45--87.Google Scholar
Brendan D. McKay. 2009. Nauty User’s Guide, Version 2.4. Technical Report. Department of Computer Science, Australian National University.Google Scholar
Sean Meyn and Richard L. Tweedie. 2009. Markov Chains and Stochastic Stability. Cambridge University Press. Google ScholarDigital Library
R. Milo, et al. and Cell Biology. 2002. Network motifs: Simple building blocks of complex networks. Science 298, 5549 (Oct. 2002), 824--827.Google ScholarCross Ref
Alan Mislove, Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, and Bobby Bhattacharjee. 2007. Measurement and analysis of online social networks. In Proceedings of ACM SIGCOMM Internet Measurement Conference 2007. 29--42. Google ScholarDigital Library
Michael Molloy and Bruce Reed. 1995. A critical point for random graphs with a given degree sequence. Random Structures and Algorithms 6, 2--3 (1995), 161--179. Google ScholarDigital Library
Saeed Omidi, Falk Schreiber, and Ali Masoudi-nejad. 2009. MODA: An efficient algorithm for network motif discovery in biological networks. Genes and Genet Systems 84, 5 (2009), 385--395.Google ScholarCross Ref
Bruno Ribeiro and Don Towsley. 2010. Estimating and sampling graphs with multidimensional random walks. In Proceedings of ACM SIGCOMM Internet Measurement Conference 2010. 390--403. Google ScholarDigital Library
Bruno Ribeiro, Pinghui Wang, Fabricio Murai, and Don Towsley. 2012. Sampling directed graphs with random walks. In Proceedings of IEEE INFOCOM 2012. 1692--1700.Google ScholarCross Ref
Bruno F. Ribeiro and Don Towsley. 2012. On the estimation accuracy of degree distributions from graph sampling. In CDC. 5240--5247.Google Scholar
Matthew Richardson, Rakesh Agrawal, and Pedro Domingos. 2003. Trust management for the semantic web. In Proceedings of the 2nd International Semantic Web Conference. 351--368.Google ScholarDigital Library
Gareth O. Roberts and Jeffrey S. Rosenthal. 2004. General state space markov chains and MCMC algorithms. Probability Surveys 1 (2004), 20--71.Google ScholarCross Ref
Shai S. Shen-Orr, Ron Milo, Shmoolik Mangan, and Uri Alon. 2002. Network motifs in the transcriptional regulation network of Escherichia coli. Nature Genetics 31, 1 (May 2002), 64--68.Google ScholarCross Ref
Lubos Takac and Michal Zabovsky. 2012. Data analysis in public social networks. In International Scientific Conference and International Workshop Present Day Trends of Innovations. 1--6.Google Scholar
Johan Ugander, Lars Backstrom, and Jon Kleinberg. 2013. Subgraph frequencies: Mapping the empirical and extremal geography of large graph collections. In Proceedings of WWW 2013. 1307--1318. Google ScholarDigital Library
Sebastian Wernicke. 2006. Efficient detection of network motifs. IEEE/ACM Transactions on Computational Biology and Bioinformatics 3, 4 (2006), 347--359. Google ScholarDigital Library
Junzhou Zhao, John C. S. Lui, Don Towsley, Xiaohong Guan, and Yadong Zhou. 2011. Empirical analysis of the evolution of follower network: A case study on Douban. In Proceedings of IEEE INFOCOM NetSciCom 2011. 941--946.Google Scholar

Index Terms

Efficiently Estimating Motif Statistics of Large Networks
1. Applied computing
  1. Law, social and behavioral sciences

Recommendations

Unbiased Characterization of Node Pairs over Large Graphs
TKDD Special Issue (SIGKDD'13)

Characterizing user pair relationships is important for applications such as friend recommendation and interest targeting in online social networks (OSNs). Due to the large-scale nature of such networks, it is infeasible to enumerate all user pairs and ...
Read More
Walking on a graph with a magnifying glass: stratified sampling via weighted random walks
SIGMETRICS '11: Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems

Our objective is to sample the node set of a large unknown graph via crawling, to accurately estimate a given metric of interest. We design a random walk on an appropriately defined weighted graph that achieves high efficiency by preferentially crawling ...
Read More
SAKE: Estimating Katz Centrality Based on Sampling for Large-Scale Social Networks

Katz centrality is a fundamental concept to measure the influence of a vertex in a social network. However, existing approaches to calculating Katz centrality in a large-scale network are unpractical and computationally expensive. In this article, we ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Knowledge Discovery from Data Volume 9, Issue 2
November 2014
193 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/2672614
Editor:
Philip S. Yu
University of Illinois at Chicago, USA
Issue’s Table of Contents
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 September 2014
- Accepted: 1 March 2014
- Revised: 1 January 2014
- Received: 1 October 2013
Published in tkdd Volume 9, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Social network
graph sampling
network motifs
random walks
subgraph patterns
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 62
  Total Citations
  View Citations
- 753
  Total Downloads
- Downloads (Last 12 months)31
- Downloads (Last 6 weeks)10
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Efficiently Estimating Motif Statistics of Large Networks

ACM Transactions on Knowledge Discovery from Data

Abstract

References

Cited By

Index Terms

Recommendations

Unbiased Characterization of Node Pairs over Large Graphs

Walking on a graph with a magnifying glass: stratified sampling via weighted random walks

SAKE: Estimating Katz Centrality Based on Sampling for Large-Scale Social Networks

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Efficiently Estimating Motif Statistics of Large Networks

ACM Transactions on Knowledge Discovery from Data

Abstract

References

Cited By

Index Terms

Recommendations

Unbiased Characterization of Node Pairs over Large Graphs

Walking on a graph with a magnifying glass: stratified sampling via weighted random walks

SAKE: Estimating Katz Centrality Based on Sampling for Large-Scale Social Networks

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media