Skip to main content
Top
Published in:

01-12-2023 | Original Article

Empirical characterization of graph sampling algorithms

Authors: Muhammad Irfan Yousuf, Izza Anwer, Raheel Anwar

Published in: Social Network Analysis and Mining | Issue 1/2023

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Graph sampling allows mining a small representative subgraph from a big graph. Sampling algorithms deploy different strategies to replicate the properties of a given graph in the sampled graph. In this study, we provide a comprehensive empirical characterization of five graph sampling algorithms on six properties of a graph including degree, clustering coefficient, path length, global clustering coefficient, assortativity, and modularity. We extract samples from fifteen graphs grouped into five categories including collaboration, social, citation, technological, and synthetic graphs. We provide both qualitative and quantitative results. We find that there is no single method that extracts true samples from a given graph with respect to the properties tested in this work. Our results show that the sampling algorithm that aggressively explores the neighborhood of a sampled node performs better than the others.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
go back to reference Becchetti L, Castillo C, Donato D, Fazzone A (2006) A comparison of sampling techniques for web graph characterization. In: LinkKDD Becchetti L, Castillo C, Donato D, Fazzone A (2006) A comparison of sampling techniques for web graph characterization. In: LinkKDD
go back to reference Benevenuto F, Rodrigues T, Cha M, Almeida V (2009) Characterizing user behavior in online social networks. In: Proceedings of the 9th ACM SIGCOMM Conference on Internet Measurement, pp 49–62 Benevenuto F, Rodrigues T, Cha M, Almeida V (2009) Characterizing user behavior in online social networks. In: Proceedings of the 9th ACM SIGCOMM Conference on Internet Measurement, pp 49–62
go back to reference Doerr C, Blenn N (2013) Metric convergence in social network sampling. In ACM Hotplanet Doerr C, Blenn N (2013) Metric convergence in social network sampling. In ACM Hotplanet
go back to reference Gjoka M, Kurant M, Butts C, Markopoulou A (2010) Walking in Facebook: a case study of unbiased sampling of OSNS. INFOCOM Gjoka M, Kurant M, Butts C, Markopoulou A (2010) Walking in Facebook: a case study of unbiased sampling of OSNS. INFOCOM
go back to reference Gkantsidis C, Mihail M, Saberi A (2006) Random walks in peer-to-peer networks: algorithms and evaluation. Perform Eval 63(3):241–263CrossRef Gkantsidis C, Mihail M, Saberi A (2006) Random walks in peer-to-peer networks: algorithms and evaluation. Perform Eval 63(3):241–263CrossRef
go back to reference Kwak H, Lee C, Park H, Moon S(2010) What is twitter, a social network or a news media? In: Proceedings of the 19th International Conference on World Wide Web, pp 591–600 Kwak H, Lee C, Park H, Moon S(2010) What is twitter, a social network or a news media? In: Proceedings of the 19th International Conference on World Wide Web, pp 591–600
go back to reference Lee S, Kim P, Jeong H (2006) Statistical Properties of Sampled Networks. Phys Rev E 73:016102CrossRef Lee S, Kim P, Jeong H (2006) Statistical Properties of Sampled Networks. Phys Rev E 73:016102CrossRef
go back to reference Leskovec J, Kleinberg J, Faloutsos C (2007) Graph evolution: Densification and shrinking diameters. ACM Trans Knowl Discov Data 1(1) Leskovec J, Kleinberg J, Faloutsos C (2007) Graph evolution: Densification and shrinking diameters. ACM Trans Knowl Discov Data 1(1)
go back to reference Liu L, Wang L, Wu W, Jia H, Zhang Y (2019) A novel hybrid-jump-based sampling method for complex social networks. IEEE Trans Comput Soc Syst 6(2):241–249CrossRef Liu L, Wang L, Wu W, Jia H, Zhang Y (2019) A novel hybrid-jump-based sampling method for complex social networks. IEEE Trans Comput Soc Syst 6(2):241–249CrossRef
go back to reference Maiya A.S, Berger-Wolf T.Y (2010) Sampling community structure. In: Proceedings of the 19th International Conference on World Wide Web. WWW ’10, pp 701–710 Maiya A.S, Berger-Wolf T.Y (2010) Sampling community structure. In: Proceedings of the 19th International Conference on World Wide Web. WWW ’10, pp 701–710
go back to reference Newman ME (2006) Modularity and community structure in networks. Proc Natl Acad Sci USA 103:8577–8582CrossRef Newman ME (2006) Modularity and community structure in networks. Proc Natl Acad Sci USA 103:8577–8582CrossRef
go back to reference Rasti AH, Torkjazi M, Rejaie R, Duffield NG, Willinger W, Stutzbach D (2009) Respondent-driven sampling for characterizing unstructured overlays. In: INFOCOM 2009. 28th IEEE International Conference on Computer Communications, 19-25 April 2009, Rio de Janeiro, Brazil, pp 2701–2705 Rasti AH, Torkjazi M, Rejaie R, Duffield NG, Willinger W, Stutzbach D (2009) Respondent-driven sampling for characterizing unstructured overlays. In: INFOCOM 2009. 28th IEEE International Conference on Computer Communications, 19-25 April 2009, Rio de Janeiro, Brazil, pp 2701–2705
go back to reference Ribeeiro B, Towsley D (2010) Estimating and Sampling Graphs with Multidimensional Random Walks. In ACM Internet Measurement Conference Ribeeiro B, Towsley D (2010) Estimating and Sampling Graphs with Multidimensional Random Walks. In ACM Internet Measurement Conference
go back to reference Ribeiro B, Towsley D (2010) Estimating and sampling graphs with multidimensional random walks. In: Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement, pp 390–403 Ribeiro B, Towsley D (2010) Estimating and sampling graphs with multidimensional random walks. In: Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement, pp 390–403
go back to reference Stutzbach D, Rejaie R, Duffield N, Sen S, Willinger W (2009) On unbiased sampling for unstructured peer-to-peer networks. IEEE/ACM Trans Netw 17(2):377–390CrossRef Stutzbach D, Rejaie R, Duffield N, Sen S, Willinger W (2009) On unbiased sampling for unstructured peer-to-peer networks. IEEE/ACM Trans Netw 17(2):377–390CrossRef
go back to reference Voudigari E, Salamanos N, Papageorgiou T, Yannakoudakis E.J (2016) Rank degree: an efficient algorithm for graph sampling. In: 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp 120–129 Voudigari E, Salamanos N, Papageorgiou T, Yannakoudakis E.J (2016) Rank degree: an efficient algorithm for graph sampling. In: 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp 120–129
go back to reference Wang T, Chen Y, Zhang Z, Xu T, Jin L, Hui P, Deng B, Li X (2011) Understanding graph sampling algorithms for social network analysis. In: Proceedings of the 2011 31st International Conference on Distributed Computing Systems Workshops. ICDCSW ’11, pp 123–128 Wang T, Chen Y, Zhang Z, Xu T, Jin L, Hui P, Deng B, Li X (2011) Understanding graph sampling algorithms for social network analysis. In: Proceedings of the 2011 31st International Conference on Distributed Computing Systems Workshops. ICDCSW ’11, pp 123–128
go back to reference Watts DJ, Strogatz SH (1998) Collective dynamics of ‘small-world’ networks. Nature 393:440–442CrossRef Watts DJ, Strogatz SH (1998) Collective dynamics of ‘small-world’ networks. Nature 393:440–442CrossRef
go back to reference Wilson C, Boe B, Sala A, Puttaswamy K.P.N, Zhao B.Y (2009) User interactions in social networks and their implications. In: Proceedings of the 4th ACM European Conference on Computer Systems, pp 205–218 Wilson C, Boe B, Sala A, Puttaswamy K.P.N, Zhao B.Y (2009) User interactions in social networks and their implications. In: Proceedings of the 4th ACM European Conference on Computer Systems, pp 205–218
go back to reference Yousuf MI, Kim S (2018) List sampling for large graphs. Intell Data Anal 22:261–295CrossRef Yousuf MI, Kim S (2018) List sampling for large graphs. Intell Data Anal 22:261–295CrossRef
go back to reference Yousuf MI, Kim S (2020) Generating graphs by creating associative and random links between existing nodes. J Stat Phys 179:1–32MathSciNetCrossRef Yousuf MI, Kim S (2020) Generating graphs by creating associative and random links between existing nodes. J Stat Phys 179:1–32MathSciNetCrossRef
Metadata
Title
Empirical characterization of graph sampling algorithms
Authors
Muhammad Irfan Yousuf
Izza Anwer
Raheel Anwar
Publication date
01-12-2023
Publisher
Springer Vienna
Published in
Social Network Analysis and Mining / Issue 1/2023
Print ISSN: 1869-5450
Electronic ISSN: 1869-5469
DOI
https://doi.org/10.1007/s13278-023-01060-5

Premium Partner