research-article

Estimating and sampling graphs with multidimensional random walks

Authors:
Bruno Ribeiro

University of Massachusetts Amherst, Amherst, MA, USA

University of Massachusetts Amherst, Amherst, MA, USA
View Profile

,
Don Towsley

University of Massachusetts Amherst, Amherst, MA, USA

University of Massachusetts Amherst, Amherst, MA, USA
View Profile

IMC '10: Proceedings of the 10th ACM SIGCOMM conference on Internet measurementNovember 2010Pages 390–403https://doi.org/10.1145/1879141.1879192

Published:01 November 2010Publication History

IMC '10: Proceedings of the 10th ACM SIGCOMM conference on Internet measurement

Pages 390–403

ABSTRACT

Estimating characteristics of large graphs via sampling is a vital part of the study of complex networks. Current sampling methods such as (independent) random vertex and random walks are useful but have drawbacks. Random vertex sampling may require too many resources (time, bandwidth, or money). Random walks, which normally require fewer resources per sample, can suffer from large estimation errors in the presence of disconnected or loosely connected graphs. In this work we propose a new m-dimensional random walk that uses m dependent random walkers. We show that the proposed sampling method, which we call Frontier sampling, exhibits all of the nice sampling properties of a regular random walk. At the same time, our simulations over large real world graphs show that, in the presence of disconnected or loosely connected components, Frontier sampling exhibits lower estimation errors than regular random walks. We also show that Frontier sampling is more suitable than random vertex sampling to sample the tail of the degree distribution of the graph.

References

Dimitris Achlioptas, Aaron Clauset, David Kempe, and Cristopher Moore. On the bias of traceroute sampling: Or, power-law degree distributions in regular graphs. J. ACM, 56(4):1--28, 2009. Google ScholarDigital Library
Chen Avin and Bhaskar Krishnamachari. The power of choice in random walks: An empirical study. Comput. Netw., 52(1):44--60, 2008. Google ScholarDigital Library
P. Balister, B. Bollobás, and A. Stacey. Dependent percolation in two dimensions. Prob. Theory and Related Fields, 117(4):495--513, 2000.Google ScholarCross Ref
Ziv Bar-Yossef and Maxim Gurevich. Random sampling from a search engine's index. J. ACM, 55(5):1--74, 2008. Google ScholarDigital Library
A.-L. Barabási and R. Albert. Emergence of scaling in random networks. Science, 286:509--512, 1999.Google ScholarCross Ref
Nabhendra Bisnik and Alhussein A. Abouzeid. Optimizing random walk search algorithms in p2p networks. Computer Networks, 51(6):1499--1514, 2007. Google ScholarDigital Library
S. Boccaletti, V. Latora, Y. Moreno, M. Chavez, and D.-U. Hwang. Complex networks: Structure and dynamics. Physics Reports, 424(4--5):175--308, 2006.Google Scholar
Christos G. Cassandras and Stephane Lafortune. Introduction to Discrete Event Systems. Springer-Verlag, Inc., 2006. Google ScholarDigital Library
Fang Chen, László Lovász, and Igor Pak. Lifting Markov chains to speed up mixing. In Proc. of STOC, pages 275--281, 1999. Google ScholarDigital Library
Junghoo Cho and Hector Garcia-Molina. Parallel crawlers. In Proc. of the WWW, pages 124--135, 2002. Google ScholarDigital Library
Nathan Eagle, Alex S. Pentland, and David Lazer. Inferring friendship network structure by using mobile phone data. PNAS, 106(36):15274--15278, August 2009.Google ScholarCross Ref
W. Feller. An Introduction to Probability Theory and its Applications, volume 1. John Wiley & Sons, New York, 3rd edition, 1968.Google Scholar
Coperative Association for Internet Data Analysis. CAIDA's Internet topology data kit #0304, 2003.Google Scholar
Charles J. Geyer. Practical Markov Chain Monte Carlo. Statistical Science, 7(4):473--483, 1992.Google ScholarCross Ref
Minas Gjoka, Maciej Kurant, Carter T. Butts, and Athina Markopoulou. A walk in Facebook: Uniform sampling of users in online social networks. In Proc. of the IEEE Infocom, March 2010.Google Scholar
Christos Gkantsidis, Milena Mihail, and Amin Saberi. Random walks in peer-to-peer networks: algorithms and evaluation. Perform. Eval., 63(3):241--263, March 2006. Google ScholarDigital Library
Monika R. Henzinger, Allan Heydon, Michael Mitzenmacher, and Marc Najork. On near-uniform url sampling. In Proceedings of the WWW, pages 295--308, 2000. Google ScholarDigital Library
Marlom A. Konrath, Marinho P. Barcellos, and Rodrigo B. Mansilha. Attacking a swarm with a band of liars: evaluating the impact of attacks on bittorrent. In P2P '07: Proceedings of the Seventh IEEE International Conference on Peer-to-Peer Computing, pages 37--44, Washington, DC, USA, 2007. IEEE Computer Society. Google ScholarDigital Library
Jure Leskovec and Christos Faloutsos. Sampling from large graphs. In Proc. of the KDD, pages 631--636, 2006. Google ScholarDigital Library
Jure Leskovec, Kevin J. Lang, Anirban Dasgupta, and Michael W. Mahoney. Statistical properties of community structure in large social and information networks. In Proc. of the WWW, pages 695--704, 2008. Google ScholarDigital Library
David A. Levin, Yuval Peres, and Elizabeth L. Wilmer. Markov Chains and Mixing Times. AMS, 2009.Google Scholar
L. Lovász. Random walks on graphs: a survey. Combinatorics, 2:1--46, 1993.Google Scholar
Laurent Massoulié, Erwan Le Merrer, Anne-Marie Kermarrec, and Ayalvadi Ganesh. Peer counting and sampling in overlay networks: random walk methods. In Proc. of the PODC, pages 123--132, 2006. Google ScholarDigital Library
Courtney McKnight, Don Des Jarlais, Heidi Bramson, Lisa Tower, Abu S. Abdul-Quader, Chris Nemeth, and Douglas Heckathorn. Respondent-driven sampling in a study of drug users in New York City: Notes from the field. Journal of Urban Health, 83(6):154--159, 2006.Google Scholar
Sean Meyn and Richard L. Tweedie. Markov Chains and Stochastic Stability. Cambridge University Press, 2009. Google ScholarDigital Library
Alan Mislove, Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, and Bobby Bhattacharjee. Measurement and Analysis of Online Social Networks. In Proc. of the IMC, October 2007. Google ScholarDigital Library
M. E. J. Newman. Assortative mixing in networks. Phys. Rev. Lett., 89(20):208701, Oct 2002.Google ScholarCross Ref
M. E. J. Newman. The structure and function of complex networks. SIAM Review, 45(2):167--256, 2003.Google ScholarDigital Library
Amir H. Rasti, Mojtaba Torkjazi, Reza Rejaie, Nick Duffield, Walter Willinger, and Daniel Stutzbach. Respondent-driven sampling for characterizing unstructured overlays. In Proc. of the IEEE Infocom, pages 2701--2705, April 2009.Google ScholarCross Ref
Bruno Ribeiro, William Gauvin, Benyuan Liu, and Don Towsley. On MySpace account spans and double Pareto-like distribution of friends. In IEEE Infocom 2010 Network Science Workshop, Mar 2010.Google ScholarCross Ref
Bruno Ribeiro and Donald Towsley. Estimating and sampling graphs with multidimensional random walks. arXiv:1002.1751v2, 2010.Google Scholar
Christian P. Robert and George Casella. Monte Carlo Statistical Methods. Springer-Verlag, 2nd edition, 2005. Google ScholarDigital Library
Paat Rusmevichientong, David M. Pennock, Steve Lawrence, and Lee C. Giles. Methods for sampling pages uniformly from the world wide web. In AAAI Fall Symposium on Using Uncertainty Within Computation, pages 121--128, 2001.Google Scholar
Thomas Schank and Dorothea Wagner. Approximating clustering-coefficient and transitivity. Journal of Graph Algorithms and Applications, 9(2):265--275, 2004.Google ScholarCross Ref
Daniel Stutzbach, Reza Rejaie, Nick Duffield, Subhabrata Sen, and Walter Willinger. On unbiased sampling for unstructured peer-to-peer networks. IEEE/ACM Trans. Netw., 17(2):377--390, 2009. Google ScholarDigital Library
Erik Volz and Douglas D. Heckathorn. Probability based estimation theory for Respondent-Driven Sampling. Journal of Official Statistics, 2008.Google Scholar
D.J. Watts and S.H. Strogatz. Collective dynamics of "small world" networks. Nature, 393:440--442, June 1998.Google ScholarCross Ref
Ming Zhong and Kai Shen. Random walk based node sampling in self-organizing networks. SIGOPS Oper. Syst. Rev., 40(3):49--55, 2006. Google ScholarDigital Library

Index Terms

Estimating and sampling graphs with multidimensional random walks
1. Mathematics of computing
  1. Probability and statistics
    1. Statistical paradigms
      1. Statistical graphics

Recommendations

Fast Random Walks on Finite Graphs and Graph Topological Information
ICNC '11: Proceedings of the 2011 Second International Conference on Networking and Computing

A random walk on a graph is a process in which a particle on a vertex repeatedly moves to its adjacent vertex according to transition probability, which is given in advance. The behavior of random walks depend on its transition probability, and the ``...
Read More
Fast distributed random walks
PODC '09: Proceedings of the 28th ACM symposium on Principles of distributed computing

Performing random walks in networks is a fundamental primitive that has found applications in many areas of computer science, including distributed computing. In this paper, we focus on the problem of performing random walks efficiently in a distributed ...
Read More
Multiple Random Walks in Random Regular Graphs

We study properties of multiple random walks on a graph under various assumptions of interaction between the particles. To give precise results, we make the analysis for random regular graphs. The cover time of a random walk on a random $r$-regular ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
IMC '10: Proceedings of the 10th ACM SIGCOMM conference on Internet measurement
November 2010
496 pages
ISBN:9781450304832
DOI:10.1145/1879141
Program Chair:
Mark Allman
International Computer Science Institute
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 November 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
estimates
frontier sampling
power law degree distribution
random walks
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate277of1,083submissions,26%
Upcoming Conference
IMC '24

Sponsor:

sigcomm

sigcomm

ACM Internet Measurement Conference

November 4 - 6, 2024

Madrid , AA , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 277
  Total Citations
  View Citations
- 1,053
  Total Downloads
- Downloads (Last 12 months)39
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Estimating and sampling graphs with multidimensional random walks

IMC '10: Proceedings of the 10th ACM SIGCOMM conference on Internet measurement

ABSTRACT

References

Cited By

Index Terms

Recommendations

Fast Random Walks on Finite Graphs and Graph Topological Information

Fast distributed random walks

Multiple Random Walks in Random Regular Graphs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Estimating and sampling graphs with multidimensional random walks

IMC '10: Proceedings of the 10th ACM SIGCOMM conference on Internet measurement

ABSTRACT

References

Cited By

Index Terms

Recommendations

Fast Random Walks on Finite Graphs and Graph Topological Information

Fast distributed random walks

Multiple Random Walks in Random Regular Graphs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media