skip to main content
10.1145/1963405.1963489acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

Estimating sizes of social networks via biased sampling

Published:28 March 2011Publication History

ABSTRACT

Online social networks have become very popular in recent years and their number of users is already measured in many hundreds of millions. For various commercial and sociological purposes, an independent estimate of their sizes is important. In this work, algorithms for estimating the number of users in such networks are considered. The proposed schemes are also applicable for estimating the sizes of networks' sub-populations. The suggested algorithms interact with the social networks via their public APIs only, and rely on no other external information. Due to obvious traffic and privacy concerns, the number of such interactions is severely limited. We therefore focus on minimizing the number of API interactions needed for producing good size estimates. We adopt the abstraction of social networks as undirected graphs and use random node sampling. By counting the number of collisions or non-unique nodes in the sample, we produce a size estimate. Then, we show analytically that the estimate error vanishes with high probability for smaller number of samples than those required by prior-art algorithms. Moreover, although our algorithms are provably correct for any graph, they excel when applied to social network-like graphs. The proposed algorithms were evaluated on synthetic as well real social networks such as Facebook, IMDB, and DBLP. Our experiments corroborated the theoretical results, and demonstrated the effectiveness of the algorithms.

References

  1. Estimating the size of a population. http://www.rsscse.org.uk/ts/gtb/johnson.pdf.Google ScholarGoogle Scholar
  2. Facebook statistics. http://www.facebook.com/press/info.php?statistics.Google ScholarGoogle Scholar
  3. Z. Bar-Yossef and M. Gurevich. Efficient search engine measurements. In Proc. of the 16th intl. conf. on World Wide Web (WWW'07), pages 401--410, Banff, Alberta, Canada, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Z. Bar-Yossef and M. Gurevich. Random sampling from a search engine's index. J. ACM, 55(5), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. K. Bharat and A. Broder. A technique for measuring the relative size and overlap of public Web search engines. Comput. Netw. ISDN Syst., 30(1--7):379--388, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Chao. Estimating the population size for capture-recapture data with unequal catchability. In Biometrics, Dec. 1987.Google ScholarGoogle Scholar
  7. M. Finkelstein, H. G. Tucker, and J. A. Veeh. Confidence intervals for the number of unseen types. Statistics & Probability Letters, 37(4):423--430, Mar. 1998.Google ScholarGoogle ScholarCross RefCross Ref
  8. M. Gjoka, M. Kurant, C. T. Butts, and A. Markopoulou. Walking in Facebook: A Case Study of Unbiased Sampling of OSNs. In Proc. of IEEE INFOCOM '10, San Diego, CA, Mar. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Han Hee, C. Tae Won, D. Vacha, Z. Yin, and Q. Lili. Scalable proximity estimation and link prediction in online social networks. In Proc. of the 9th ACM SIGCOMM conf. on Internet Measurement (IMC'09), pages 322--335, Chicago, IL, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. Huggins, H.-C. Yang, A. Chao, and P. S. F. Yip. Population size estimation using local sample coverage for open populations. Journal of Statistical Planning and Inference, 113(2):699 -- 714, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  11. D. E. Knuth. Estimating the efficiency of backtrack programs. Technical report, Stanford, CA, 1974. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Y. Koren, S. C. North, and C. Volinsky. Measuring and extracting proximity in networks. In Proc. of the 12th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data mining (KDD'06), pages 245--255, Philadelphia, PA, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Kurant, M. Gjoka, C. T. Butts, and A. Markopoulou. Walking on a Graph with a Magnifying Glass. In Proc. of ACM SIGMETRICS '11, San Jose, CA, Jun. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. L. Lovasz. Random walks on graphs. a survey. Combinatorics, 1993.Google ScholarGoogle Scholar
  15. A. Marchetti-Spaccamela. On the estimate of the size of a directed graph. In J. van Leeuwen, editor, Graph-Theoretic Concepts in Computer Science, volume 344 of Lecture Notes in Computer Science, pages 317--326. Springer Berlin / Heidelberg, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and B. Bhattacharjee. Measurement and analysis of online social networks. In Proc. of the 5th ACM Conf. on Internet Measurement (IMC'07), San Diego, CA, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. Motwani and S. Vassilvitskii. Distinct values estimators for power law distributions. In Proc. of the Third Workshop on Analytic Algorithmics and Combinatorics (ANALCO'06), Miami, FL, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  18. L. Pitt. A note on extending knuth's tree estimator to directed acyclic graphs. Inf. Process. Lett., 24(3):203--206, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Ye and F. Wu. Estimating the size of online social networks. In Proc. of the IEEE 2nd Intl. Conf. on Social Computing (SocialCom), pages 169--176, Aug. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Yong-Yeol, H. Seungyeop, K. Haewoon, M. Sue, and J. Hawoong. Analysis of topological characteristics of huge online social networking services. In Proc. of the 16th Intl. Conf. on World Wide Web (WWW'07), pages 835--844, Banff, Alberta, Canada, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Estimating sizes of social networks via biased sampling

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      WWW '11: Proceedings of the 20th international conference on World wide web
      March 2011
      840 pages
      ISBN:9781450306324
      DOI:10.1145/1963405

      Copyright © 2011 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 28 March 2011

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,899of8,196submissions,23%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader