Skip to main content
Top
Published in: Social Network Analysis and Mining 4/2013

01-12-2013 | Original Article

Spiraling Facebook: an alternative Metropolis–Hastings random walk using a spiral proposal distribution

Authors: C. A. Piña-García, Dongbing Gu

Published in: Social Network Analysis and Mining | Issue 4/2013

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Sampling the content of an Online Social Network (OSN) is a major application area due to the growing interest in collecting social information e.g., email, location, age and number of friends. Large-scale social networks such as Facebook can be difficult to sample due to the amount of data and the privacy settings imposed by this company. Sampling techniques require the development of reliable algorithms able to cope with an unknown environment. Our main purpose in this manuscript is to examine whether it is possible to switch the normal distribution of the Metropolis–Hasting random walk (MHRW) by using a spiral approach as an alternative and reliable distribution. We propose a sampling algorithm, the Alternative Metropolis–Hasting random walk AMHRW, to study the effect of collecting digital profiles on two different datasets. We examine the soundness and robustness of the proposed algorithm through independent walks on two different representative samples of Facebook. We observe that normal distribution performance can be approximated by means of the use of an Illusion spiral. Similarly, we provide a formal convergence analysis to evaluate the performance of our independent walks and to evaluate whether the sample of draws has attained an equilibrium state. Finally, our preliminary results provide experimental evidence that collecting data with the AMHRW algorithm can be equally effective as the MHRW algorithm on large-scale networks.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Footnotes
1
When we sample with replacement, the sample values are independent.
 
2
A full analysis of the whole structure of Facebook can be found in (Ugander et al. 2011).
 
Literature
go back to reference Akkermans H (2012) Web dynamics as a random walk: how and why power laws occur Akkermans H (2012) Web dynamics as a random walk: how and why power laws occur
go back to reference Allison P (2010) Survival analysis using SAS: a practical guide. Sas Institute, Cary Allison P (2010) Survival analysis using SAS: a practical guide. Sas Institute, Cary
go back to reference Backstrom L, Leskovec J (2011) Supervised random walks: predicting and recommending links in social networks. In: Proceedings of the 4th ACM international conference on Web search and data mining, ACM, pp 635–644 Backstrom L, Leskovec J (2011) Supervised random walks: predicting and recommending links in social networks. In: Proceedings of the 4th ACM international conference on Web search and data mining, ACM, pp 635–644
go back to reference Bar-Yossef Z, Gurevich M (2008) Random sampling from a search engine’s index. J ACM 55(5):24 Bar-Yossef Z, Gurevich M (2008) Random sampling from a search engine’s index. J ACM 55(5):24
go back to reference Beskos A, Stuart A (2009) Computational complexity of Metropolis–Hastings methods in high dimensions. In: Monte Carlo and Quasi-Monte Carlo methods 2008, Springer, Berlin, pp 61–71 Beskos A, Stuart A (2009) Computational complexity of Metropolis–Hastings methods in high dimensions. In: Monte Carlo and Quasi-Monte Carlo methods 2008, Springer, Berlin, pp 61–71
go back to reference Best N, Cowles M, Vines K (1995) Coda* convergence diagnosis and output analysis software for gibbs sampling output version 0.30. MRC Biostatistics Unit, Cambridge Best N, Cowles M, Vines K (1995) Coda* convergence diagnosis and output analysis software for gibbs sampling output version 0.30. MRC Biostatistics Unit, Cambridge
go back to reference Bhattacharyya P, Garg A, Wu SF (2011) Analysis of user keyword similarity in online social networks. Soc Netw Anal Min 1(3):143–158CrossRef Bhattacharyya P, Garg A, Wu SF (2011) Analysis of user keyword similarity in online social networks. Soc Netw Anal Min 1(3):143–158CrossRef
go back to reference Caci B, Cardaci M, Tabacchi ME (2011) Facebook as a small world: a topological hypothesis. Soc Netw Anal Min, pp 1–5 Caci B, Cardaci M, Tabacchi ME (2011) Facebook as a small world: a topological hypothesis. Soc Netw Anal Min, pp 1–5
go back to reference Catanese S, De Meo P, Ferrara E, Fiumara G, Provetti A (2011) Crawling facebook for social network analysis purposes. Arxiv preprint arXiv:1105.6307 Catanese S, De Meo P, Ferrara E, Fiumara G, Provetti A (2011) Crawling facebook for social network analysis purposes. Arxiv preprint arXiv:1105.6307
go back to reference Chapra S, Canale R (2010) Numerical methods for engineers, vol 2. McGraw-Hill, New York Chapra S, Canale R (2010) Numerical methods for engineers, vol 2. McGraw-Hill, New York
go back to reference Codling E, Bearon R, Thorn G (2010) Diffusion about the mean drift location in a biased random walk. Ecology 91(10):3106–3113CrossRef Codling E, Bearon R, Thorn G (2010) Diffusion about the mean drift location in a biased random walk. Ecology 91(10):3106–3113CrossRef
go back to reference Cook T (1979) The curves of life. Dover Publications, New York Cook T (1979) The curves of life. Dover Publications, New York
go back to reference Cutillo L, Molva R, Onen M (2011) Analysis of privacy in online social networks from the graph theory perspective. In: Proceedings of the global telecommunications conference (GLOBECOM 2011), IEEE, pp 1–5 Cutillo L, Molva R, Onen M (2011) Analysis of privacy in online social networks from the graph theory perspective. In: Proceedings of the global telecommunications conference (GLOBECOM 2011), IEEE, pp 1–5
go back to reference Davis P, Gautschi W, Iserles A (1993) Spirals: from Theodorus to chaos. AK Peters, Wellesley Davis P, Gautschi W, Iserles A (1993) Spirals: from Theodorus to chaos. AK Peters, Wellesley
go back to reference Dudewicz E (1976) Introduction to statistics and probability. Holt, Rinehart and Winston Dudewicz E (1976) Introduction to statistics and probability. Holt, Rinehart and Winston
go back to reference Ferri F, Grifoni P, Guzzo T (2012) New forms of social and professional digital relationships: the case of facebook. Soc Netw Anal Min 2(2):121–137CrossRef Ferri F, Grifoni P, Guzzo T (2012) New forms of social and professional digital relationships: the case of facebook. Soc Netw Anal Min 2(2):121–137CrossRef
go back to reference Geweke J et al (1991) Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments. Research Department, Federal Reserve Bank of Minneapolis Geweke J et al (1991) Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments. Research Department, Federal Reserve Bank of Minneapolis
go back to reference Gjoka M, Kurant M, Butts C, Markopoulou A (2011) Practical recommendations on crawling online social networks. IEEE J Sel Areas Commun 29(9):1872–1892CrossRef Gjoka M, Kurant M, Butts C, Markopoulou A (2011) Practical recommendations on crawling online social networks. IEEE J Sel Areas Commun 29(9):1872–1892CrossRef
go back to reference Gjoka M, Kurant M, Butts CT, Markopoulou A (2010) Walking in Facebook: a case study of unbiased sampling of OSNs. In: Proceedings of IEEE INFOCOM ’10. San Diego, CA Gjoka M, Kurant M, Butts CT, Markopoulou A (2010) Walking in Facebook: a case study of unbiased sampling of OSNs. In: Proceedings of IEEE INFOCOM ’10. San Diego, CA
go back to reference Hargittai I (1992) Spiral symmetry. World Scientific Publishing Company Incorporated, Singapore Hargittai I (1992) Spiral symmetry. World Scientific Publishing Company Incorporated, Singapore
go back to reference Karatzas I, Shreve S (1991) Brownian motion and stochastic calculus, vol 113, Springer, Berlin Karatzas I, Shreve S (1991) Brownian motion and stochastic calculus, vol 113, Springer, Berlin
go back to reference Katzir L, Liberty E, Somekh O (2011) Estimating sizes of social networks via biased sampling. In: Proceedings of the 20th international conference on World wide web, ACM, pp 597–606 Katzir L, Liberty E, Somekh O (2011) Estimating sizes of social networks via biased sampling. In: Proceedings of the 20th international conference on World wide web, ACM, pp 597–606
go back to reference Kurant M, Gjoka M, Butts CT, Markopoulou A (2011) Walking on a graph with a magnifying glass: stratified sampling via weighted random walks. In: Proceedings of ACM SIGMETRICS ’11. San Jose, CA Kurant M, Gjoka M, Butts CT, Markopoulou A (2011) Walking on a graph with a magnifying glass: stratified sampling via weighted random walks. In: Proceedings of ACM SIGMETRICS ’11. San Jose, CA
go back to reference Lafore R, Waite M (2003) Data structures and algorithms in Java. Sams Publishing Lafore R, Waite M (2003) Data structures and algorithms in Java. Sams Publishing
go back to reference LeSage J (1999) Applied econometrics using matlab. Manuscript, Department of Economics, University of Toronto LeSage J (1999) Applied econometrics using matlab. Manuscript, Department of Economics, University of Toronto
go back to reference Martinez W, Martinez A (2001) Computational statistics handbook with MATLAB, vol 2. Chapman and Hall/CRC Martinez W, Martinez A (2001) Computational statistics handbook with MATLAB, vol 2. Chapman and Hall/CRC
go back to reference Mislove A, Gummadi KP, Druschel P (2006) Exploiting social networks for internet search. In: Proceedings of the 5th workshop on hot topics in networks (HotNets06). Citeseer, p 79 Mislove A, Gummadi KP, Druschel P (2006) Exploiting social networks for internet search. In: Proceedings of the 5th workshop on hot topics in networks (HotNets06). Citeseer, p 79
go back to reference Papagelis M, Das G, Koudas N (2011) Sampling online social networks Papagelis M, Das G, Koudas N (2011) Sampling online social networks
go back to reference Ribeiro B, Towsley D (2010) Estimating and sampling graphs with multidimensional random walks. In: Proceedings of the 10th annual conference on Internet measurement, ACM, pp 390–403 Ribeiro B, Towsley D (2010) Estimating and sampling graphs with multidimensional random walks. In: Proceedings of the 10th annual conference on Internet measurement, ACM, pp 390–403
go back to reference Robert C, Casella G (2009) Introducing Monte Carlo methods with R. Springer, Berlin Robert C, Casella G (2009) Introducing Monte Carlo methods with R. Springer, Berlin
go back to reference Scott J (2011) Social network analysis: developments, advances, and prospects. Soc Netw Anal Min 1(1):21–26CrossRef Scott J (2011) Social network analysis: developments, advances, and prospects. Soc Netw Anal Min 1(1):21–26CrossRef
go back to reference Stuckman J, Purtilo J (2011) Analyzing the wikisphere: methodology and data to support quantitative wiki research. J Am Soc Inf Sci Technol 62(8):1564–1576CrossRef Stuckman J, Purtilo J (2011) Analyzing the wikisphere: methodology and data to support quantitative wiki research. J Am Soc Inf Sci Technol 62(8):1564–1576CrossRef
go back to reference Tang J, Musolesi M, Mascolo C, Latora V (2009) Temporal distance metrics for social network analysis. In: Proceedings of the 2nd ACM workshop on Online social networks, ACM, pp 31–36 Tang J, Musolesi M, Mascolo C, Latora V (2009) Temporal distance metrics for social network analysis. In: Proceedings of the 2nd ACM workshop on Online social networks, ACM, pp 31–36
go back to reference Ugander J, Karrer B, Backstrom L, Marlow C (2011) The anatomy of the facebook social graph. Arxiv preprint arXiv:1111.4503 Ugander J, Karrer B, Backstrom L, Marlow C (2011) The anatomy of the facebook social graph. Arxiv preprint arXiv:1111.4503
go back to reference Viswanathan G, Buldyrev S, Havlin S, Da Luz M, Raposo E, Stanley H (1999) Optimizing the success of random searches. Nature 401(6756):911–914CrossRef Viswanathan G, Buldyrev S, Havlin S, Da Luz M, Raposo E, Stanley H (1999) Optimizing the success of random searches. Nature 401(6756):911–914CrossRef
Metadata
Title
Spiraling Facebook: an alternative Metropolis–Hastings random walk using a spiral proposal distribution
Authors
C. A. Piña-García
Dongbing Gu
Publication date
01-12-2013
Publisher
Springer Vienna
Published in
Social Network Analysis and Mining / Issue 4/2013
Print ISSN: 1869-5450
Electronic ISSN: 1869-5469
DOI
https://doi.org/10.1007/s13278-013-0126-8

Other articles of this Issue 4/2013

Social Network Analysis and Mining 4/2013 Go to the issue

Premium Partner