research-article

Walking on a graph with a magnifying glass: stratified sampling via weighted random walks

Authors:
Maciej Kurant

UC Irvine, Irvine, CA, USA

UC Irvine, Irvine, CA, USA
View Profile

,
Minas Gjoka

UC Irvine, Irvine, CA, USA

UC Irvine, Irvine, CA, USA
View Profile

,
Carter T. Butts

UC Irvine, Irvine, CA, USA

UC Irvine, Irvine, CA, USA
View Profile

,
Athina Markopoulou

UC Irvine, Irvine, CA, USA

UC Irvine, Irvine, CA, USA
View Profile

SIGMETRICS '11: Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systemsJune 2011Pages 281–292https://doi.org/10.1145/1993744.1993773

Published:07 June 2011Publication History

SIGMETRICS '11: Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems

Pages 281–292

ABSTRACT

Our objective is to sample the node set of a large unknown graph via crawling, to accurately estimate a given metric of interest. We design a random walk on an appropriately defined weighted graph that achieves high efficiency by preferentially crawling those nodes and edges that convey greater information regarding the target metric. Our approach begins by employing the theory of stratification to find optimal node weights, for a given estimation problem, under an independence sampler. While optimal under independence sampling, these weights may be impractical under graph crawling due to constraints arising from the structure of the graph. Therefore, the edge weights for our random walk should be chosen so as to lead to an equilibrium distribution that strikes a balance between approximating the optimal weights under an independence sampler and achieving fast convergence. We propose a heuristic approach (stratified weighted random walk, or S-WRW) that achieves this goal, while using only limited information about the graph structure and the node properties. We evaluate our technique in simulation, and experimentally, by collecting a sample of Facebook college users. We show that S-WRW requires 13-15 times fewer samples than the simple re-weighted random walk (RW) to achieve the same estimation accuracy for a range of metrics.

Supplemental Material

metrics_8_1.mp4

mp4

140.4 MB

Download

References

Weighted Random Walks of the Facebook social graph: http://odysseas.calit2.uci.edu/osn, 2011.Google Scholar
D. Achlioptas, A. Clauset, D. Kempe, and C. On the bias of traceroute sampling: or, power-law degree distributions in regular graphs. Journal of the ACM, 2009. Google ScholarDigital Library
Y. Ahn, S. Han, H. Kwak, S. Moon, and H. Jeong. Analysis of topological characteristics of huge online social networking services. In WWW, pages 835--844, 2007. Google ScholarDigital Library
D. Aldous and J. A. Fill. Reversible Markov Chains and Random Walks on Graphs. In preparation.Google Scholar
K. Avrachenkov, B. Ribeiro, and D. Towsley. Improving Random Walk Estimation Accuracy with Uniform Restarts. In I7th Workshop on Algorithms and Models for the Web Graph, 2010.Google Scholar
L. Backstrom and J. Kleinberg. Network Bucket Testing. In WWW, 2011. Google ScholarDigital Library
L. Backstrom and J. Leskovec. Supervised Random Walks: Predicting and Recommending Links in Social Networks. In ACM International Conference on Web Search and Data Minig (WSDM), 2011. Google ScholarDigital Library
L. Becchetti, C. Castillo, D. Donato, and A. Fazzone. A comparison of sampling techniques for web graph characterization. In LinkKDD, 2006.Google Scholar
H. R. Bernard, T. Hallett, A. Iovita, E. C. Johnsen, R. Lyerla, C. McCarty, M. Mahy, M. J. Salganik, T. Saliuk, O. Scutelniciuc, G. a. Shelley, P. Sirinirund, S. Weir, and D. F. Stroup. Counting hard-to-count populations: the network scale-up method for public health. Sexually Transmitted Infections, 86(Suppl 2):ii11--ii15, Nov. 2010.Google Scholar
S. Boyd, P. Diaconis, and L. Xiao. Fastest mixing Markov chain on a graph. SIAM review, 46(4):667--689, 2004. Google ScholarDigital Library
S. Chakrabarti. Focused crawling: a new approach to topic-specific Web resource discovery. Computer Networks, 31(11-16):1623--1640, May 1999. Google ScholarDigital Library
W. G. Cochran. Sampling Techniques, volume 20 of McGraw-Hil Series in Probability and Statistics. Wiley, 1977.Google Scholar
M. Diligenti, F. Coetzee, S. Lawrence, C. Giles, and M. Gori. Focused crawling using context graphs. In Proceedings of the 26th International Conference on Very Large Data Bases, pages 527--534, 2000. Google ScholarDigital Library
N. Duffield, C. Lund, and M. Thorup. Learn more, sample less: control of volume and variance in network measurement. IEEE Transactions on Information Theory, 51(5):1756--1775, May 2005. Google ScholarDigital Library
J. Gentle. Random number generation and Monte Carlo methods. Springer Verlag, 2003.Google Scholar
W. R. Gilks, S. Richardson, and D. J. Spiegelhalter. Markov Chain Monte Carlo in Practice. Chapman and Hall/CRC, 1996.Google Scholar
M. Gjoka, C. Butts, M. Kurant, and A. Markopoulou. Multigraph Sampling of Online Social Networks. arXiv, (arXiv:1008.2565v1):1--10, 2010.Google Scholar
M. Gjoka, M. Kurant, C. T. Butts, and A. Markopoulou. Walking in Facebook: A Case Study of Unbiased Sampling of OSNs. In INFOCOM, 2010. Google ScholarDigital Library
C. Gkantsidis, M. Mihail, and A. Saberi. Random walks in peer-to-peer networks. In INFOCOM, 2004.Google ScholarCross Ref
M. Hansen and W. Hurwitz. On the Theory of Sampling from Finite Populations. Annals of Mathematical Statistics, 14(3), 1943.Google Scholar
D. D. Heckathorn. Respondent-Driven Sampling: A New Approach to the Study of Hidden Populations. Social Problems, 44:174--199, 1997.Google ScholarCross Ref
M. R. Henzinger, A. Heydon, M. Mitzenmacher, and M. Najork. On near-uniform URL sampling. In WWW, 2000. Google ScholarDigital Library
E. D. Kolaczyk. Statistical Analysis of Network Data, volume 69 of Springer Series in Statistics. Springer New York, 2009. Google ScholarDigital Library
B. Krishnamurthy, P. Gill, and M. Arlitt. A few chirps about Twitter. In WOSN, 2008. Google ScholarDigital Library
M. Kurant, M. Gjoka, C. Butts, and A. Markopoulou. Walking on a Graph with a Magnifying Glass. Arxiv preprint arXiv:1101.5463, 2011. Google ScholarDigital Library
M. Kurant, A. Markopoulou, and P. Thiran. On the bias of BFS (Breadth First Search). In ITC, also in arXiv:1004.1729, 2010.Google Scholar
S. H. Lee, P.-J. Kim, and H. Jeong. Statistical properties of Sampled Networks. Phys. Rev. E, 73:16102, 2006.Google ScholarCross Ref
J. Leskovec and C. Faloutsos. Sampling from large graphs. In KDD, pages 631--636, 2006. Google ScholarDigital Library
S. Lohr. Sampling: design and analysis. Brooks/Cole, second edition, 2009.Google Scholar
L. Lovász. Random walks on graphs: A survey. Combinatorics, Paul Erdos is Eighty, 2(1):1--46, 1993.Google Scholar
N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller. Equation of state calculation by fast computing machines. Journal of Chemical Physics, 21:1087--1092, 1953.Google ScholarCross Ref
A. Mislove, H. S. Koppula, K. P. Gummadi, P. Druschel, and B. Bhattacharjee. Growth of the Flickr social network. In WOSN, 2008. Google ScholarDigital Library
A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and B. Bhattacharjee. Measurement and analysis of online social networks. In IMC, pages 29--42, 2007. Google ScholarDigital Library
A. Mohaisen, A. Yun, and Y. Kim. Measuring the mixing time of social graphs. IMC, 2010. Google ScholarDigital Library
J. Neyman. On the Two Different Aspects of the Representative Method: The Method of Stratified Sampling and the Method of Purposive Selection. Journal of the Royal Statistical Society, 97(4):558, 1934.Google ScholarCross Ref
A. Rasti, M. Torkjazi, R. Rejaie, N. Duffield, W. Willinger, and D. Stutzbach. Respondent-driven sampling for characterizing unstructured overlays. In Infocom Mini-conference, pages 2701--2705, 2009.Google ScholarCross Ref
A. H. Rasti, M. Torkjazi, R. Rejaie, and D. Stutzbach. Evaluating Sampling Techniques for Large Dynamic Graphs. In Technical Report, volume 1, 2008.Google Scholar
B. Ribeiro and D. Towsley. Estimating and sampling graphs with multidimensional random walks. In IMC, volume 011, 2010. Google ScholarDigital Library
B. Ribeiro, P. Wang, and D. Towsley. On Estimating Degree Distributions of Directed Graphs through Sampling. UMass Technical Report, 2010.Google Scholar
M. Salganik and D. D. Heckathorn. Sampling and estimation in hidden populations using respondent-driven sampling. Sociological Methodology, 34(1):193--240, 2004.Google ScholarCross Ref
D. Stutzbach, R. Rejaie, N. Duffield, S. Sen, and W. Willinger. On unbiased sampling for unstructured peer-to-peer networks. In IMC, 2006. Google ScholarDigital Library
E. Volz and D. D. Heckathorn. Probability based estimation theory for respondent driven sampling. Journal of Official Statistics, 24(1):79--97, 2008.Google Scholar
W. Willinger, R. Rejaie, M. Torkjazi, M. Valafar, and M. Maggioni. OSN Research: Time to Face the Real Challenges. In HotMetrics, 2009.Google Scholar
C. Wilson, B. Boe, A. Sala, K. P. N. Puttaswamy, and B. Y. Zhao. User interactions in social networks and their implications. In EuroSys, 2009. Google ScholarDigital Library

Index Terms

Walking on a graph with a magnifying glass: stratified sampling via weighted random walks
1. General and reference
  1. Cross-computing tools and techniques
    1. Measurement
    2. Metrics

Recommendations

Walking on a graph with a magnifying glass: stratified sampling via weighted random walks
Performance evaluation review

Our objective is to sample the node set of a large unknown graph via crawling, to accurately estimate a given metric of interest. We design a random walk on an appropriately defined weighted graph that achieves high efficiency by preferentially crawling ...
Read More
Walking in facebook: a case study of unbiased sampling of OSNs
INFOCOM'10: Proceedings of the 29th conference on Information communications

With more than 250 million active users [1], Facebook (FB) is currently one of the most important online social networks. Our goal in this paper is to obtain a representative (unbiased) sample of Facebook users by crawling its social graph. In this ...
Read More
Unbiased Sampling of Bipartite Graph
CYBERC '11: Proceedings of the 2011 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery

Increasing size of online social networks (OSNs) has given rise to sampling method studies that provide a relatively small but representative sample of large-scale OSNs so that the measurement and analysis burden can be affordable. So far, a number of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMETRICS '11: Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
June 2011
376 pages
ISBN:9781450308144
DOI:10.1145/1993744
General Chair:
Arif Merchant
Google, USA
,
Program Chairs:
Kimberly Keeton
HP Labs, USA
,
Dan Rubenstein
Columbia University, USA
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 June 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
graph sampling
online social networks
random walks
stratified sampling
weighted graphs
weighted random walks
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate459of2,691submissions,17%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 94
  Total Citations
  View Citations
- 708
  Total Downloads
- Downloads (Last 12 months)20
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Walking on a graph with a magnifying glass: stratified sampling via weighted random walks

SIGMETRICS '11: Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Walking on a graph with a magnifying glass: stratified sampling via weighted random walks

Walking in facebook: a case study of unbiased sampling of OSNs

Unbiased Sampling of Bipartite Graph

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Walking on a graph with a magnifying glass: stratified sampling via weighted random walks

SIGMETRICS '11: Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Walking on a graph with a magnifying glass: stratified sampling via weighted random walks

Walking in facebook: a case study of unbiased sampling of OSNs

Unbiased Sampling of Bipartite Graph

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media