skip to main content
10.1145/1879141.1879192acmconferencesArticle/Chapter ViewAbstractPublication PagesimcConference Proceedingsconference-collections
research-article

Estimating and sampling graphs with multidimensional random walks

Authors Info & Claims
Published:01 November 2010Publication History

ABSTRACT

Estimating characteristics of large graphs via sampling is a vital part of the study of complex networks. Current sampling methods such as (independent) random vertex and random walks are useful but have drawbacks. Random vertex sampling may require too many resources (time, bandwidth, or money). Random walks, which normally require fewer resources per sample, can suffer from large estimation errors in the presence of disconnected or loosely connected graphs. In this work we propose a new m-dimensional random walk that uses m dependent random walkers. We show that the proposed sampling method, which we call Frontier sampling, exhibits all of the nice sampling properties of a regular random walk. At the same time, our simulations over large real world graphs show that, in the presence of disconnected or loosely connected components, Frontier sampling exhibits lower estimation errors than regular random walks. We also show that Frontier sampling is more suitable than random vertex sampling to sample the tail of the degree distribution of the graph.

References

  1. Dimitris Achlioptas, Aaron Clauset, David Kempe, and Cristopher Moore. On the bias of traceroute sampling: Or, power-law degree distributions in regular graphs. J. ACM, 56(4):1--28, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Chen Avin and Bhaskar Krishnamachari. The power of choice in random walks: An empirical study. Comput. Netw., 52(1):44--60, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. P. Balister, B. Bollobás, and A. Stacey. Dependent percolation in two dimensions. Prob. Theory and Related Fields, 117(4):495--513, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  4. Ziv Bar-Yossef and Maxim Gurevich. Random sampling from a search engine's index. J. ACM, 55(5):1--74, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A.-L. Barabási and R. Albert. Emergence of scaling in random networks. Science, 286:509--512, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  6. Nabhendra Bisnik and Alhussein A. Abouzeid. Optimizing random walk search algorithms in p2p networks. Computer Networks, 51(6):1499--1514, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Boccaletti, V. Latora, Y. Moreno, M. Chavez, and D.-U. Hwang. Complex networks: Structure and dynamics. Physics Reports, 424(4--5):175--308, 2006.Google ScholarGoogle Scholar
  8. Christos G. Cassandras and Stephane Lafortune. Introduction to Discrete Event Systems. Springer-Verlag, Inc., 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Fang Chen, László Lovász, and Igor Pak. Lifting Markov chains to speed up mixing. In Proc. of STOC, pages 275--281, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Junghoo Cho and Hector Garcia-Molina. Parallel crawlers. In Proc. of the WWW, pages 124--135, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Nathan Eagle, Alex S. Pentland, and David Lazer. Inferring friendship network structure by using mobile phone data. PNAS, 106(36):15274--15278, August 2009.Google ScholarGoogle ScholarCross RefCross Ref
  12. W. Feller. An Introduction to Probability Theory and its Applications, volume 1. John Wiley & Sons, New York, 3rd edition, 1968.Google ScholarGoogle Scholar
  13. Coperative Association for Internet Data Analysis. CAIDA's Internet topology data kit #0304, 2003.Google ScholarGoogle Scholar
  14. Charles J. Geyer. Practical Markov Chain Monte Carlo. Statistical Science, 7(4):473--483, 1992.Google ScholarGoogle ScholarCross RefCross Ref
  15. Minas Gjoka, Maciej Kurant, Carter T. Butts, and Athina Markopoulou. A walk in Facebook: Uniform sampling of users in online social networks. In Proc. of the IEEE Infocom, March 2010.Google ScholarGoogle Scholar
  16. Christos Gkantsidis, Milena Mihail, and Amin Saberi. Random walks in peer-to-peer networks: algorithms and evaluation. Perform. Eval., 63(3):241--263, March 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Monika R. Henzinger, Allan Heydon, Michael Mitzenmacher, and Marc Najork. On near-uniform url sampling. In Proceedings of the WWW, pages 295--308, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Marlom A. Konrath, Marinho P. Barcellos, and Rodrigo B. Mansilha. Attacking a swarm with a band of liars: evaluating the impact of attacks on bittorrent. In P2P '07: Proceedings of the Seventh IEEE International Conference on Peer-to-Peer Computing, pages 37--44, Washington, DC, USA, 2007. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jure Leskovec and Christos Faloutsos. Sampling from large graphs. In Proc. of the KDD, pages 631--636, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Jure Leskovec, Kevin J. Lang, Anirban Dasgupta, and Michael W. Mahoney. Statistical properties of community structure in large social and information networks. In Proc. of the WWW, pages 695--704, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. David A. Levin, Yuval Peres, and Elizabeth L. Wilmer. Markov Chains and Mixing Times. AMS, 2009.Google ScholarGoogle Scholar
  22. L. Lovász. Random walks on graphs: a survey. Combinatorics, 2:1--46, 1993.Google ScholarGoogle Scholar
  23. Laurent Massoulié, Erwan Le Merrer, Anne-Marie Kermarrec, and Ayalvadi Ganesh. Peer counting and sampling in overlay networks: random walk methods. In Proc. of the PODC, pages 123--132, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Courtney McKnight, Don Des Jarlais, Heidi Bramson, Lisa Tower, Abu S. Abdul-Quader, Chris Nemeth, and Douglas Heckathorn. Respondent-driven sampling in a study of drug users in New York City: Notes from the field. Journal of Urban Health, 83(6):154--159, 2006.Google ScholarGoogle Scholar
  25. Sean Meyn and Richard L. Tweedie. Markov Chains and Stochastic Stability. Cambridge University Press, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Alan Mislove, Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, and Bobby Bhattacharjee. Measurement and Analysis of Online Social Networks. In Proc. of the IMC, October 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. M. E. J. Newman. Assortative mixing in networks. Phys. Rev. Lett., 89(20):208701, Oct 2002.Google ScholarGoogle ScholarCross RefCross Ref
  28. M. E. J. Newman. The structure and function of complex networks. SIAM Review, 45(2):167--256, 2003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Amir H. Rasti, Mojtaba Torkjazi, Reza Rejaie, Nick Duffield, Walter Willinger, and Daniel Stutzbach. Respondent-driven sampling for characterizing unstructured overlays. In Proc. of the IEEE Infocom, pages 2701--2705, April 2009.Google ScholarGoogle ScholarCross RefCross Ref
  30. Bruno Ribeiro, William Gauvin, Benyuan Liu, and Don Towsley. On MySpace account spans and double Pareto-like distribution of friends. In IEEE Infocom 2010 Network Science Workshop, Mar 2010.Google ScholarGoogle ScholarCross RefCross Ref
  31. Bruno Ribeiro and Donald Towsley. Estimating and sampling graphs with multidimensional random walks. arXiv:1002.1751v2, 2010.Google ScholarGoogle Scholar
  32. Christian P. Robert and George Casella. Monte Carlo Statistical Methods. Springer-Verlag, 2nd edition, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Paat Rusmevichientong, David M. Pennock, Steve Lawrence, and Lee C. Giles. Methods for sampling pages uniformly from the world wide web. In AAAI Fall Symposium on Using Uncertainty Within Computation, pages 121--128, 2001.Google ScholarGoogle Scholar
  34. Thomas Schank and Dorothea Wagner. Approximating clustering-coefficient and transitivity. Journal of Graph Algorithms and Applications, 9(2):265--275, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  35. Daniel Stutzbach, Reza Rejaie, Nick Duffield, Subhabrata Sen, and Walter Willinger. On unbiased sampling for unstructured peer-to-peer networks. IEEE/ACM Trans. Netw., 17(2):377--390, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Erik Volz and Douglas D. Heckathorn. Probability based estimation theory for Respondent-Driven Sampling. Journal of Official Statistics, 2008.Google ScholarGoogle Scholar
  37. D.J. Watts and S.H. Strogatz. Collective dynamics of "small world" networks. Nature, 393:440--442, June 1998.Google ScholarGoogle ScholarCross RefCross Ref
  38. Ming Zhong and Kai Shen. Random walk based node sampling in self-organizing networks. SIGOPS Oper. Syst. Rev., 40(3):49--55, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Estimating and sampling graphs with multidimensional random walks

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      IMC '10: Proceedings of the 10th ACM SIGCOMM conference on Internet measurement
      November 2010
      496 pages
      ISBN:9781450304832
      DOI:10.1145/1879141
      • Program Chair:
      • Mark Allman

      Copyright © 2010 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 November 2010

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate277of1,083submissions,26%

      Upcoming Conference

      IMC '24
      ACM Internet Measurement Conference
      November 4 - 6, 2024
      Madrid , AA , Spain

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader