Skip to main content

2018 | OriginalPaper | Buchkapitel

Extension Sampling Designs for Big Networks: Application to Twitter

verfasst von : A. Rebecq

Erschienen in: Nonparametric Statistics

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

With the rise of big data, more and more attention is paid to statistical network analysis. However, exact computation of many statistics of interest is of prohibitive cost for big graphs. Statistical estimators can thus be preferable. Model-based estimators for networks have some drawbacks. We study design-based estimates relying on sampling methods that were developed specifically for use on graph populations. In this contribution, we test some sampling designs that can be described as “extension” sampling designs. Unit selection happens in two phases: in the first phase, simple designs such as Bernoulli sampling are used, and in the second phase, some units are selected among those that are somehow linked to the units in the first-phase sample. We test these methods on Twitter data, because the size and structure of the Twitter graph is typical of big social networks for which such methods would be very useful.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Barabási, A. L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286(5439), 509–512. Barabási, A. L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286(5439), 509–512.
2.
Zurück zum Zitat Bollen, J., Mao, H., & Zeng, X. (2011). Twitter mood predicts the stock market. Journal of Computational Science, 2(1), 1–8. Bollen, J., Mao, H., & Zeng, X. (2011). Twitter mood predicts the stock market. Journal of Computational Science, 2(1), 1–8.
3.
Zurück zum Zitat Burnap, P., Gibson, R., Sloan, L., Southern, R., & Williams, M. (2015). 140 characters to victory? Using twitter to predict the UK 2015 general election. arXiv:1505.01511. Burnap, P., Gibson, R., Sloan, L., Southern, R., & Williams, M. (2015). 140 characters to victory? Using twitter to predict the UK 2015 general election. arXiv:1505.01511.
4.
Zurück zum Zitat Conover, M., Ratkiewicz, J., Francisco, M., Gonçalves, B., Menczer, F., & Flammini, A. (2011). Political polarization on twitter. In ICWSM. Conover, M., Ratkiewicz, J., Francisco, M., Gonçalves, B., Menczer, F., & Flammini, A. (2011). Political polarization on twitter. In ICWSM.
6.
Zurück zum Zitat Deville, J. C., & Särndal, C. E. (1992). Calibration estimators in survey sampling. Journal of the American statistical Association, 87(418), 376–382. Deville, J. C., & Särndal, C. E. (1992). Calibration estimators in survey sampling. Journal of the American statistical Association, 87(418), 376–382.
8.
Zurück zum Zitat Ferrara, E., Varol, O., Davis, C., Menczer, F., & Flammini, A. (2014). The rise of social bots. arXiv:1407.5225. Ferrara, E., Varol, O., Davis, C., Menczer, F., & Flammini, A. (2014). The rise of social bots. arXiv:1407.5225.
9.
Zurück zum Zitat Frank, O. (1977). Survey sampling in graphs. Journal of Statistical Planning and Inference, 1(3), 235–264. Frank, O. (1977). Survey sampling in graphs. Journal of Statistical Planning and Inference, 1(3), 235–264.
10.
Zurück zum Zitat Hansen, M. H., & Hurwitz, W. N. (1943). On the theory of sampling from finite populations. The Annals of Mathematical Statistics, 14(4), 333–362. Hansen, M. H., & Hurwitz, W. N. (1943). On the theory of sampling from finite populations. The Annals of Mathematical Statistics, 14(4), 333–362.
11.
Zurück zum Zitat Horvitz, D. G., & Thompson, D. J. (1952). A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47(260), 663–685. Horvitz, D. G., & Thompson, D. J. (1952). A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47(260), 663–685.
12.
Zurück zum Zitat Isaki, C. T., & Fuller, W. A. (1982). Survey design under the regression superpopulation model. Journal of the American Statistical Association, 77(377), 89–96. Isaki, C. T., & Fuller, W. A. (1982). Survey design under the regression superpopulation model. Journal of the American Statistical Association, 77(377), 89–96.
13.
Zurück zum Zitat Kolaczyk, E. D. (2009). Statistical analysis of network data. Berlin: Springer. Kolaczyk, E. D. (2009). Statistical analysis of network data. Berlin: Springer.
14.
Zurück zum Zitat Lavallée, P., & Caron, P. (2001). Estimation par la méthode généralisée du partage des poids: Le cas du couplage d’enregistrements. Survey Methodology, 27(2), 171–188. Lavallée, P., & Caron, P. (2001). Estimation par la méthode généralisée du partage des poids: Le cas du couplage d’enregistrements. Survey Methodology, 27(2), 171–188.
16.
Zurück zum Zitat Leskovec, J., & Faloutsos, C. (2006). Sampling from large graphs. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 631–636). New York: ACM. Leskovec, J., & Faloutsos, C. (2006). Sampling from large graphs. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 631–636). New York: ACM.
19.
Zurück zum Zitat Mustafaraj, E., Finn, S., Whitlock, C., & Metaxas, P. T. (2011). Vocal minority versus silent majority: Discovering the opinions of the long tail. In Privacy, Security, Risk and Trust (PASSAT) and 2011 IEEE Third International Conference on Social Computing (SocialCom) (pp. 103–110). Mustafaraj, E., Finn, S., Whitlock, C., & Metaxas, P. T. (2011). Vocal minority versus silent majority: Discovering the opinions of the long tail. In Privacy, Security, Risk and Trust (PASSAT) and 2011 IEEE Third International Conference on Social Computing (SocialCom) (pp. 103–110).
20.
Zurück zum Zitat Myers, S. A., Sharma, A., Gupta, P., & Lin, J. (2014). Information network or social network? The structure of the twitter follow graph. In Proceedings of the Companion Publication of the 23rd International Conference on World Wide Web Companion (pp. 493–498). International World Wide Web Conferences Steering Committee. Myers, S. A., Sharma, A., Gupta, P., & Lin, J. (2014). Information network or social network? The structure of the twitter follow graph. In Proceedings of the Companion Publication of the 23rd International Conference on World Wide Web Companion (pp. 493–498). International World Wide Web Conferences Steering Committee.
21.
Zurück zum Zitat Neyman, J. (1934). On the two different aspects of the representative method: The method of stratified sampling and the method of purposive selection. Journal of the Royal Statistical Society, 97, 558–625. Neyman, J. (1934). On the two different aspects of the representative method: The method of stratified sampling and the method of purposive selection. Journal of the Royal Statistical Society, 97, 558–625.
22.
Zurück zum Zitat Nowicki, K., & Snijders, T. (2001). Estimation and prediction for stochastic blockstructures. Journal of the American Statistical Association, 96(455), 1077–1087. Nowicki, K., & Snijders, T. (2001). Estimation and prediction for stochastic blockstructures. Journal of the American Statistical Association, 96(455), 1077–1087.
23.
Zurück zum Zitat Rivers, D., & Bailey, D. (2009). Inference from matched samples in the 2008 US National elections. In Proceedings of the Joint Statistical Meetings (pp. 627–639) Rivers, D., & Bailey, D. (2009). Inference from matched samples in the 2008 US National elections. In Proceedings of the Joint Statistical Meetings (pp. 627–639)
24.
Zurück zum Zitat Särndal, C. E., Swensson, B., & Wretman, J. (2003). Model assisted survey sampling. New York: Springer Science & Business Media. Särndal, C. E., Swensson, B., & Wretman, J. (2003). Model assisted survey sampling. New York: Springer Science & Business Media.
25.
Zurück zum Zitat Sautory, O. (2012). Les enjeux méthodologiques liés à l’usage de bases de sondage imparfaites. Conference Report. Sautory, O. (2012). Les enjeux méthodologiques liés à l’usage de bases de sondage imparfaites. Conference Report.
26.
Zurück zum Zitat Sloan, L., Morgan, J., Housley, W., Williams, M., Edwards, A., Burnap, et al. (2013). Knowing the tweeters: Deriving sociologically relevant demographics from twitter. Sociological Research Online, 18(3), 7. Sloan, L., Morgan, J., Housley, W., Williams, M., Edwards, A., Burnap, et al. (2013). Knowing the tweeters: Deriving sociologically relevant demographics from twitter. Sociological Research Online, 18(3), 7.
27.
Zurück zum Zitat Thompson, S. K. (1990). Adaptive cluster sampling. Journal of the American Statistical Association, 85(412), 1050–1059 Thompson, S. K. (1990). Adaptive cluster sampling. Journal of the American Statistical Association, 85(412), 1050–1059
28.
Zurück zum Zitat Thompson, S. K. (1991). Stratified adaptive cluster sampling. Biometrika, 78, 389–397. Thompson, S. K. (1991). Stratified adaptive cluster sampling. Biometrika, 78, 389–397.
29.
Zurück zum Zitat Thompson, S. K. (1998). Adaptive sampling in graphs. In Proceedings of the Section on Survey Methods Research, American Statistical Association (pp. 13–22). Thompson, S. K. (1998). Adaptive sampling in graphs. In Proceedings of the Section on Survey Methods Research, American Statistical Association (pp. 13–22).
30.
Zurück zum Zitat Thompson, S. K. (2006). Adaptive web sampling. Biometrics, 62(4), 1224–1234. Thompson, S. K. (2006). Adaptive web sampling. Biometrics, 62(4), 1224–1234.
31.
Zurück zum Zitat Tillé, Y. (2001). Théorie des sondages. Paris: Dunod. Tillé, Y. (2001). Théorie des sondages. Paris: Dunod.
32.
Zurück zum Zitat Tumasjan, A., Sprenger, T. O., Sandner, P. G., & Welpe, I. M. (2010). Predicting elections with twitter: What 140 characters reveal about political sentiment. International AAAI Conference on Web and Social Media, 10, 178–185. Tumasjan, A., Sprenger, T. O., Sandner, P. G., & Welpe, I. M. (2010). Predicting elections with twitter: What 140 characters reveal about political sentiment. International AAAI Conference on Web and Social Media, 10, 178–185.
33.
Zurück zum Zitat Watts, D. J., & Strogatz, S. H. (1998). Collective dynamics of ‘small-world’ networks. Nature, 393(6684), 440–442. Watts, D. J., & Strogatz, S. H. (1998). Collective dynamics of ‘small-world’ networks. Nature, 393(6684), 440–442.
Metadaten
Titel
Extension Sampling Designs for Big Networks: Application to Twitter
verfasst von
A. Rebecq
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-96941-1_17

Premium Partner