research-article

Estimating sizes of social networks via biased sampling

Authors:
Liran Katzir

Yahoo! Labs., Haifa, Israel

Yahoo! Labs., Haifa, Israel
View Profile

,
Edo Liberty

Yahoo! Labs., Haifa, Israel

Yahoo! Labs., Haifa, Israel
View Profile

,
Oren Somekh

Yahoo! Labs., Haifa, Israel

Yahoo! Labs., Haifa, Israel
View Profile

WWW '11: Proceedings of the 20th international conference on World wide webMarch 2011Pages 597–606https://doi.org/10.1145/1963405.1963489

Published:28 March 2011Publication History

WWW '11: Proceedings of the 20th international conference on World wide web

Pages 597–606

ABSTRACT

Online social networks have become very popular in recent years and their number of users is already measured in many hundreds of millions. For various commercial and sociological purposes, an independent estimate of their sizes is important. In this work, algorithms for estimating the number of users in such networks are considered. The proposed schemes are also applicable for estimating the sizes of networks' sub-populations. The suggested algorithms interact with the social networks via their public APIs only, and rely on no other external information. Due to obvious traffic and privacy concerns, the number of such interactions is severely limited. We therefore focus on minimizing the number of API interactions needed for producing good size estimates. We adopt the abstraction of social networks as undirected graphs and use random node sampling. By counting the number of collisions or non-unique nodes in the sample, we produce a size estimate. Then, we show analytically that the estimate error vanishes with high probability for smaller number of samples than those required by prior-art algorithms. Moreover, although our algorithms are provably correct for any graph, they excel when applied to social network-like graphs. The proposed algorithms were evaluated on synthetic as well real social networks such as Facebook, IMDB, and DBLP. Our experiments corroborated the theoretical results, and demonstrated the effectiveness of the algorithms.

References

Estimating the size of a population. http://www.rsscse.org.uk/ts/gtb/johnson.pdf.Google Scholar
Facebook statistics. http://www.facebook.com/press/info.php?statistics.Google Scholar
Z. Bar-Yossef and M. Gurevich. Efficient search engine measurements. In Proc. of the 16th intl. conf. on World Wide Web (WWW'07), pages 401--410, Banff, Alberta, Canada, 2007. Google ScholarDigital Library
Z. Bar-Yossef and M. Gurevich. Random sampling from a search engine's index. J. ACM, 55(5), 2008. Google ScholarDigital Library
K. Bharat and A. Broder. A technique for measuring the relative size and overlap of public Web search engines. Comput. Netw. ISDN Syst., 30(1--7):379--388, 1998. Google ScholarDigital Library
A. Chao. Estimating the population size for capture-recapture data with unequal catchability. In Biometrics, Dec. 1987.Google Scholar
M. Finkelstein, H. G. Tucker, and J. A. Veeh. Confidence intervals for the number of unseen types. Statistics & Probability Letters, 37(4):423--430, Mar. 1998.Google ScholarCross Ref
M. Gjoka, M. Kurant, C. T. Butts, and A. Markopoulou. Walking in Facebook: A Case Study of Unbiased Sampling of OSNs. In Proc. of IEEE INFOCOM '10, San Diego, CA, Mar. 2010. Google ScholarDigital Library
S. Han Hee, C. Tae Won, D. Vacha, Z. Yin, and Q. Lili. Scalable proximity estimation and link prediction in online social networks. In Proc. of the 9th ACM SIGCOMM conf. on Internet Measurement (IMC'09), pages 322--335, Chicago, IL, 2009. Google ScholarDigital Library
R. Huggins, H.-C. Yang, A. Chao, and P. S. F. Yip. Population size estimation using local sample coverage for open populations. Journal of Statistical Planning and Inference, 113(2):699 -- 714, 2003.Google ScholarCross Ref
D. E. Knuth. Estimating the efficiency of backtrack programs. Technical report, Stanford, CA, 1974. Google ScholarDigital Library
Y. Koren, S. C. North, and C. Volinsky. Measuring and extracting proximity in networks. In Proc. of the 12th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data mining (KDD'06), pages 245--255, Philadelphia, PA, 2006. Google ScholarDigital Library
M. Kurant, M. Gjoka, C. T. Butts, and A. Markopoulou. Walking on a Graph with a Magnifying Glass. In Proc. of ACM SIGMETRICS '11, San Jose, CA, Jun. 2011. Google ScholarDigital Library
L. Lovasz. Random walks on graphs. a survey. Combinatorics, 1993.Google Scholar
A. Marchetti-Spaccamela. On the estimate of the size of a directed graph. In J. van Leeuwen, editor, Graph-Theoretic Concepts in Computer Science, volume 344 of Lecture Notes in Computer Science, pages 317--326. Springer Berlin / Heidelberg, 1989. Google ScholarDigital Library
A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and B. Bhattacharjee. Measurement and analysis of online social networks. In Proc. of the 5th ACM Conf. on Internet Measurement (IMC'07), San Diego, CA, 2007. Google ScholarDigital Library
R. Motwani and S. Vassilvitskii. Distinct values estimators for power law distributions. In Proc. of the Third Workshop on Analytic Algorithmics and Combinatorics (ANALCO'06), Miami, FL, 2006.Google ScholarCross Ref
L. Pitt. A note on extending knuth's tree estimator to directed acyclic graphs. Inf. Process. Lett., 24(3):203--206, 1987. Google ScholarDigital Library
S. Ye and F. Wu. Estimating the size of online social networks. In Proc. of the IEEE 2nd Intl. Conf. on Social Computing (SocialCom), pages 169--176, Aug. 2010. Google ScholarDigital Library
A. Yong-Yeol, H. Seungyeop, K. Haewoon, M. Sue, and J. Hawoong. Analysis of topological characteristics of huge online social networking services. In Proc. of the 16th Intl. Conf. on World Wide Web (WWW'07), pages 835--844, Banff, Alberta, Canada, 2007. Google ScholarDigital Library

Index Terms

Estimating sizes of social networks via biased sampling
1. Theory of computation
  1. Design and analysis of algorithms

Recommendations

Estimating clustering coefficients and size of social networks via random walk
WWW '13: Proceedings of the 22nd international conference on World Wide Web

Online social networks have become a major force in today's society and economy. The largest of today's social networks may have hundreds of millions to more than a billion users. Such networks are too large to be downloaded or stored locally, even if ...
Read More
Estimating Clustering Coefficients and Size of Social Networks via Random Walk

This work addresses the problem of estimating social network measures. Specifically, the measures at hand are the network average and global clustering coefficients and the number of registered users. The algorithms at hand (1) assume no prior knowledge ...
Read More
Analysis of topological characteristics of huge online social networking services
WWW '07: Proceedings of the 16th international conference on World Wide Web

Social networking services are a fast-growing business in the Internet. However, it is unknown if online relationships and their growth patterns are the same as in real-life social networks. In this paper, we compare the structures of three online ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '11: Proceedings of the 20th international conference on World wide web
March 2011
840 pages
ISBN:9781450306324
DOI:10.1145/1963405
General Chairs:
S. Sadagopan
IIIT-Bangalore, India
,
Krithi Ramamritham
IIT-Bombay, India
,
Arun Kumar
IBM Research, India
,
M. P. Ravindra
Infosys E & R, India
,
Program Chairs:
Elisa Bertino
Purdue University, USA
,
Ravi Kumar
Yahoo! Research, USA
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 28 March 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
population estimator
sampling
social network
undirected graph
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 98
  Total Citations
  View Citations
- 723
  Total Downloads
- Downloads (Last 12 months)32
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Estimating sizes of social networks via biased sampling

WWW '11: Proceedings of the 20th international conference on World wide web

ABSTRACT

References

Cited By

Index Terms

Recommendations

Estimating clustering coefficients and size of social networks via random walk

Estimating Clustering Coefficients and Size of Social Networks via Random Walk

Analysis of topological characteristics of huge online social networking services