ABSTRACT
Online social networks have become very popular in recent years and their number of users is already measured in many hundreds of millions. For various commercial and sociological purposes, an independent estimate of their sizes is important. In this work, algorithms for estimating the number of users in such networks are considered. The proposed schemes are also applicable for estimating the sizes of networks' sub-populations. The suggested algorithms interact with the social networks via their public APIs only, and rely on no other external information. Due to obvious traffic and privacy concerns, the number of such interactions is severely limited. We therefore focus on minimizing the number of API interactions needed for producing good size estimates. We adopt the abstraction of social networks as undirected graphs and use random node sampling. By counting the number of collisions or non-unique nodes in the sample, we produce a size estimate. Then, we show analytically that the estimate error vanishes with high probability for smaller number of samples than those required by prior-art algorithms. Moreover, although our algorithms are provably correct for any graph, they excel when applied to social network-like graphs. The proposed algorithms were evaluated on synthetic as well real social networks such as Facebook, IMDB, and DBLP. Our experiments corroborated the theoretical results, and demonstrated the effectiveness of the algorithms.
- Estimating the size of a population. http://www.rsscse.org.uk/ts/gtb/johnson.pdf.Google Scholar
- Facebook statistics. http://www.facebook.com/press/info.php?statistics.Google Scholar
- Z. Bar-Yossef and M. Gurevich. Efficient search engine measurements. In Proc. of the 16th intl. conf. on World Wide Web (WWW'07), pages 401--410, Banff, Alberta, Canada, 2007. Google ScholarDigital Library
- Z. Bar-Yossef and M. Gurevich. Random sampling from a search engine's index. J. ACM, 55(5), 2008. Google ScholarDigital Library
- K. Bharat and A. Broder. A technique for measuring the relative size and overlap of public Web search engines. Comput. Netw. ISDN Syst., 30(1--7):379--388, 1998. Google ScholarDigital Library
- A. Chao. Estimating the population size for capture-recapture data with unequal catchability. In Biometrics, Dec. 1987.Google Scholar
- M. Finkelstein, H. G. Tucker, and J. A. Veeh. Confidence intervals for the number of unseen types. Statistics & Probability Letters, 37(4):423--430, Mar. 1998.Google ScholarCross Ref
- M. Gjoka, M. Kurant, C. T. Butts, and A. Markopoulou. Walking in Facebook: A Case Study of Unbiased Sampling of OSNs. In Proc. of IEEE INFOCOM '10, San Diego, CA, Mar. 2010. Google ScholarDigital Library
- S. Han Hee, C. Tae Won, D. Vacha, Z. Yin, and Q. Lili. Scalable proximity estimation and link prediction in online social networks. In Proc. of the 9th ACM SIGCOMM conf. on Internet Measurement (IMC'09), pages 322--335, Chicago, IL, 2009. Google ScholarDigital Library
- R. Huggins, H.-C. Yang, A. Chao, and P. S. F. Yip. Population size estimation using local sample coverage for open populations. Journal of Statistical Planning and Inference, 113(2):699 -- 714, 2003.Google ScholarCross Ref
- D. E. Knuth. Estimating the efficiency of backtrack programs. Technical report, Stanford, CA, 1974. Google ScholarDigital Library
- Y. Koren, S. C. North, and C. Volinsky. Measuring and extracting proximity in networks. In Proc. of the 12th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data mining (KDD'06), pages 245--255, Philadelphia, PA, 2006. Google ScholarDigital Library
- M. Kurant, M. Gjoka, C. T. Butts, and A. Markopoulou. Walking on a Graph with a Magnifying Glass. In Proc. of ACM SIGMETRICS '11, San Jose, CA, Jun. 2011. Google ScholarDigital Library
- L. Lovasz. Random walks on graphs. a survey. Combinatorics, 1993.Google Scholar
- A. Marchetti-Spaccamela. On the estimate of the size of a directed graph. In J. van Leeuwen, editor, Graph-Theoretic Concepts in Computer Science, volume 344 of Lecture Notes in Computer Science, pages 317--326. Springer Berlin / Heidelberg, 1989. Google ScholarDigital Library
- A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and B. Bhattacharjee. Measurement and analysis of online social networks. In Proc. of the 5th ACM Conf. on Internet Measurement (IMC'07), San Diego, CA, 2007. Google ScholarDigital Library
- R. Motwani and S. Vassilvitskii. Distinct values estimators for power law distributions. In Proc. of the Third Workshop on Analytic Algorithmics and Combinatorics (ANALCO'06), Miami, FL, 2006.Google ScholarCross Ref
- L. Pitt. A note on extending knuth's tree estimator to directed acyclic graphs. Inf. Process. Lett., 24(3):203--206, 1987. Google ScholarDigital Library
- S. Ye and F. Wu. Estimating the size of online social networks. In Proc. of the IEEE 2nd Intl. Conf. on Social Computing (SocialCom), pages 169--176, Aug. 2010. Google ScholarDigital Library
- A. Yong-Yeol, H. Seungyeop, K. Haewoon, M. Sue, and J. Hawoong. Analysis of topological characteristics of huge online social networking services. In Proc. of the 16th Intl. Conf. on World Wide Web (WWW'07), pages 835--844, Banff, Alberta, Canada, 2007. Google ScholarDigital Library
Index Terms
- Estimating sizes of social networks via biased sampling
Recommendations
Estimating clustering coefficients and size of social networks via random walk
WWW '13: Proceedings of the 22nd international conference on World Wide WebOnline social networks have become a major force in today's society and economy. The largest of today's social networks may have hundreds of millions to more than a billion users. Such networks are too large to be downloaded or stored locally, even if ...
Estimating Clustering Coefficients and Size of Social Networks via Random Walk
This work addresses the problem of estimating social network measures. Specifically, the measures at hand are the network average and global clustering coefficients and the number of registered users. The algorithms at hand (1) assume no prior knowledge ...
Analysis of topological characteristics of huge online social networking services
WWW '07: Proceedings of the 16th international conference on World Wide WebSocial networking services are a fast-growing business in the Internet. However, it is unknown if online relationships and their growth patterns are the same as in real-life social networks. In this paper, we compare the structures of three online ...
Comments