ABSTRACT
Estimating characteristics of large graphs via sampling is a vital part of the study of complex networks. Current sampling methods such as (independent) random vertex and random walks are useful but have drawbacks. Random vertex sampling may require too many resources (time, bandwidth, or money). Random walks, which normally require fewer resources per sample, can suffer from large estimation errors in the presence of disconnected or loosely connected graphs. In this work we propose a new m-dimensional random walk that uses m dependent random walkers. We show that the proposed sampling method, which we call Frontier sampling, exhibits all of the nice sampling properties of a regular random walk. At the same time, our simulations over large real world graphs show that, in the presence of disconnected or loosely connected components, Frontier sampling exhibits lower estimation errors than regular random walks. We also show that Frontier sampling is more suitable than random vertex sampling to sample the tail of the degree distribution of the graph.
- Dimitris Achlioptas, Aaron Clauset, David Kempe, and Cristopher Moore. On the bias of traceroute sampling: Or, power-law degree distributions in regular graphs. J. ACM, 56(4):1--28, 2009. Google ScholarDigital Library
- Chen Avin and Bhaskar Krishnamachari. The power of choice in random walks: An empirical study. Comput. Netw., 52(1):44--60, 2008. Google ScholarDigital Library
- P. Balister, B. Bollobás, and A. Stacey. Dependent percolation in two dimensions. Prob. Theory and Related Fields, 117(4):495--513, 2000.Google ScholarCross Ref
- Ziv Bar-Yossef and Maxim Gurevich. Random sampling from a search engine's index. J. ACM, 55(5):1--74, 2008. Google ScholarDigital Library
- A.-L. Barabási and R. Albert. Emergence of scaling in random networks. Science, 286:509--512, 1999.Google ScholarCross Ref
- Nabhendra Bisnik and Alhussein A. Abouzeid. Optimizing random walk search algorithms in p2p networks. Computer Networks, 51(6):1499--1514, 2007. Google ScholarDigital Library
- S. Boccaletti, V. Latora, Y. Moreno, M. Chavez, and D.-U. Hwang. Complex networks: Structure and dynamics. Physics Reports, 424(4--5):175--308, 2006.Google Scholar
- Christos G. Cassandras and Stephane Lafortune. Introduction to Discrete Event Systems. Springer-Verlag, Inc., 2006. Google ScholarDigital Library
- Fang Chen, László Lovász, and Igor Pak. Lifting Markov chains to speed up mixing. In Proc. of STOC, pages 275--281, 1999. Google ScholarDigital Library
- Junghoo Cho and Hector Garcia-Molina. Parallel crawlers. In Proc. of the WWW, pages 124--135, 2002. Google ScholarDigital Library
- Nathan Eagle, Alex S. Pentland, and David Lazer. Inferring friendship network structure by using mobile phone data. PNAS, 106(36):15274--15278, August 2009.Google ScholarCross Ref
- W. Feller. An Introduction to Probability Theory and its Applications, volume 1. John Wiley & Sons, New York, 3rd edition, 1968.Google Scholar
- Coperative Association for Internet Data Analysis. CAIDA's Internet topology data kit #0304, 2003.Google Scholar
- Charles J. Geyer. Practical Markov Chain Monte Carlo. Statistical Science, 7(4):473--483, 1992.Google ScholarCross Ref
- Minas Gjoka, Maciej Kurant, Carter T. Butts, and Athina Markopoulou. A walk in Facebook: Uniform sampling of users in online social networks. In Proc. of the IEEE Infocom, March 2010.Google Scholar
- Christos Gkantsidis, Milena Mihail, and Amin Saberi. Random walks in peer-to-peer networks: algorithms and evaluation. Perform. Eval., 63(3):241--263, March 2006. Google ScholarDigital Library
- Monika R. Henzinger, Allan Heydon, Michael Mitzenmacher, and Marc Najork. On near-uniform url sampling. In Proceedings of the WWW, pages 295--308, 2000. Google ScholarDigital Library
- Marlom A. Konrath, Marinho P. Barcellos, and Rodrigo B. Mansilha. Attacking a swarm with a band of liars: evaluating the impact of attacks on bittorrent. In P2P '07: Proceedings of the Seventh IEEE International Conference on Peer-to-Peer Computing, pages 37--44, Washington, DC, USA, 2007. IEEE Computer Society. Google ScholarDigital Library
- Jure Leskovec and Christos Faloutsos. Sampling from large graphs. In Proc. of the KDD, pages 631--636, 2006. Google ScholarDigital Library
- Jure Leskovec, Kevin J. Lang, Anirban Dasgupta, and Michael W. Mahoney. Statistical properties of community structure in large social and information networks. In Proc. of the WWW, pages 695--704, 2008. Google ScholarDigital Library
- David A. Levin, Yuval Peres, and Elizabeth L. Wilmer. Markov Chains and Mixing Times. AMS, 2009.Google Scholar
- L. Lovász. Random walks on graphs: a survey. Combinatorics, 2:1--46, 1993.Google Scholar
- Laurent Massoulié, Erwan Le Merrer, Anne-Marie Kermarrec, and Ayalvadi Ganesh. Peer counting and sampling in overlay networks: random walk methods. In Proc. of the PODC, pages 123--132, 2006. Google ScholarDigital Library
- Courtney McKnight, Don Des Jarlais, Heidi Bramson, Lisa Tower, Abu S. Abdul-Quader, Chris Nemeth, and Douglas Heckathorn. Respondent-driven sampling in a study of drug users in New York City: Notes from the field. Journal of Urban Health, 83(6):154--159, 2006.Google Scholar
- Sean Meyn and Richard L. Tweedie. Markov Chains and Stochastic Stability. Cambridge University Press, 2009. Google ScholarDigital Library
- Alan Mislove, Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, and Bobby Bhattacharjee. Measurement and Analysis of Online Social Networks. In Proc. of the IMC, October 2007. Google ScholarDigital Library
- M. E. J. Newman. Assortative mixing in networks. Phys. Rev. Lett., 89(20):208701, Oct 2002.Google ScholarCross Ref
- M. E. J. Newman. The structure and function of complex networks. SIAM Review, 45(2):167--256, 2003.Google ScholarDigital Library
- Amir H. Rasti, Mojtaba Torkjazi, Reza Rejaie, Nick Duffield, Walter Willinger, and Daniel Stutzbach. Respondent-driven sampling for characterizing unstructured overlays. In Proc. of the IEEE Infocom, pages 2701--2705, April 2009.Google ScholarCross Ref
- Bruno Ribeiro, William Gauvin, Benyuan Liu, and Don Towsley. On MySpace account spans and double Pareto-like distribution of friends. In IEEE Infocom 2010 Network Science Workshop, Mar 2010.Google ScholarCross Ref
- Bruno Ribeiro and Donald Towsley. Estimating and sampling graphs with multidimensional random walks. arXiv:1002.1751v2, 2010.Google Scholar
- Christian P. Robert and George Casella. Monte Carlo Statistical Methods. Springer-Verlag, 2nd edition, 2005. Google ScholarDigital Library
- Paat Rusmevichientong, David M. Pennock, Steve Lawrence, and Lee C. Giles. Methods for sampling pages uniformly from the world wide web. In AAAI Fall Symposium on Using Uncertainty Within Computation, pages 121--128, 2001.Google Scholar
- Thomas Schank and Dorothea Wagner. Approximating clustering-coefficient and transitivity. Journal of Graph Algorithms and Applications, 9(2):265--275, 2004.Google ScholarCross Ref
- Daniel Stutzbach, Reza Rejaie, Nick Duffield, Subhabrata Sen, and Walter Willinger. On unbiased sampling for unstructured peer-to-peer networks. IEEE/ACM Trans. Netw., 17(2):377--390, 2009. Google ScholarDigital Library
- Erik Volz and Douglas D. Heckathorn. Probability based estimation theory for Respondent-Driven Sampling. Journal of Official Statistics, 2008.Google Scholar
- D.J. Watts and S.H. Strogatz. Collective dynamics of "small world" networks. Nature, 393:440--442, June 1998.Google ScholarCross Ref
- Ming Zhong and Kai Shen. Random walk based node sampling in self-organizing networks. SIGOPS Oper. Syst. Rev., 40(3):49--55, 2006. Google ScholarDigital Library
Index Terms
- Estimating and sampling graphs with multidimensional random walks
Recommendations
Fast Random Walks on Finite Graphs and Graph Topological Information
ICNC '11: Proceedings of the 2011 Second International Conference on Networking and ComputingA random walk on a graph is a process in which a particle on a vertex repeatedly moves to its adjacent vertex according to transition probability, which is given in advance. The behavior of random walks depend on its transition probability, and the ``...
Fast distributed random walks
PODC '09: Proceedings of the 28th ACM symposium on Principles of distributed computingPerforming random walks in networks is a fundamental primitive that has found applications in many areas of computer science, including distributed computing. In this paper, we focus on the problem of performing random walks efficiently in a distributed ...
Multiple Random Walks in Random Regular Graphs
We study properties of multiple random walks on a graph under various assumptions of interaction between the particles. To give precise results, we make the analysis for random regular graphs. The cover time of a random walk on a random $r$-regular ...
Comments