Skip to main content
Erschienen in: Social Network Analysis and Mining 1/2024

01.12.2024 | Original Article

An adaptive graph sampling framework for graph analytics

verfasst von: Kewen Wang

Erschienen in: Social Network Analysis and Mining | Ausgabe 1/2024

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In large-scale data processing, graph analytics of complex interaction networks are indispensable. As the whole graph processing and analytics can be inefficient and usually impractical, graph sampling by keeping a portion of the original graph becomes a favorable approach. While prior work focused on fixed edge and node selection strategy based on predetermined criteria, without adaptive feedback to adjust the sampling process, this type of sampling algorithms has limited flexibility and estimation accuracy for complex graphs. In this paper, we propose an adaptive graph sampling framework, and design AdapES, an adaptive edge sampling algorithm based on this framework. Compared to non-adaptive sampling methods, our approach can continually monitor the difference between the current sampled subgraph and the original graph, and dynamically adjust the edge sampling probability based on this observed sampling difference. Guided by a preset sampling goal, this algorithm automatically adapts to the fluctuations in the random sampling process with high flexibility. The experimental evaluation in 11 datasets demonstrates that AdapES outperforms other algorithms for preserving various graph properties and statistics.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Abu-El-Haija S, Fatemi B, Axiotis K, Bulut N, Gasteiger J, Dillon JV, Perozzi B, Bateni M (2023) Submix: learning to mix graph sampling heuristics. In: The 39th conference on uncertainty in artificial intelligence Abu-El-Haija S, Fatemi B, Axiotis K, Bulut N, Gasteiger J, Dillon JV, Perozzi B, Bateni M (2023) Submix: learning to mix graph sampling heuristics. In: The 39th conference on uncertainty in artificial intelligence
Zurück zum Zitat Ahmed NK, Neville J, Kompella R (2013) Network sampling: from static to streaming graphs. ACM Trans Knowl Discov Data (TKDD) 8(2):1–56 Ahmed NK, Neville J, Kompella R (2013) Network sampling: from static to streaming graphs. ACM Trans Knowl Discov Data (TKDD) 8(2):1–56
Zurück zum Zitat Ahmed NK, Duffield N, Neville J, Kompella R (2014) Graph sample and hold: a framework for big-graph analytics. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 1446–1455 Ahmed NK, Duffield N, Neville J, Kompella R (2014) Graph sample and hold: a framework for big-graph analytics. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 1446–1455
Zurück zum Zitat Alev VL, Lau LC (2020) Improved analysis of higher order random walks and applications. In: Proceedings of the 52nd annual ACM SIGACT symposium on theory of computing, pp 1198–1211 Alev VL, Lau LC (2020) Improved analysis of higher order random walks and applications. In: Proceedings of the 52nd annual ACM SIGACT symposium on theory of computing, pp 1198–1211
Zurück zum Zitat Ben-Eliezer O, Eden T, Oren J, Fotakis D (2022) Sampling multiple nodes in large networks: beyond random walks. In: Proceedings of the fifteenth ACM international conference on web search and data mining, pp 37–47 Ben-Eliezer O, Eden T, Oren J, Fotakis D (2022) Sampling multiple nodes in large networks: beyond random walks. In: Proceedings of the fifteenth ACM international conference on web search and data mining, pp 37–47
Zurück zum Zitat Bera SK, Seshadhri C (2020) How to count triangles, without seeing the whole graph. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pp 306–316 Bera SK, Seshadhri C (2020) How to count triangles, without seeing the whole graph. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pp 306–316
Zurück zum Zitat Chen X, Tan H, Chen Y, He B, Wong WF, Chen D (2021a) ThunderGP: HLS-based graph processing framework on FPGAs. In: The 2021 ACM/SIGDA international symposium on field-programmable gate arrays, pp 69–80 Chen X, Tan H, Chen Y, He B, Wong WF, Chen D (2021a) ThunderGP: HLS-based graph processing framework on FPGAs. In: The 2021 ACM/SIGDA international symposium on field-programmable gate arrays, pp 69–80
Zurück zum Zitat Chen Y, Huang S, Zhao L, Dissanayake G (2021b) Cramér-rao bounds and optimal design metrics for pose-graph slam. IEEE Trans Robot 37(2):627–641CrossRef Chen Y, Huang S, Zhao L, Dissanayake G (2021b) Cramér-rao bounds and optimal design metrics for pose-graph slam. IEEE Trans Robot 37(2):627–641CrossRef
Zurück zum Zitat Choe M, Yoo J, Lee G, Baek W, Kang U, Shin K (2022) Midas: representative sampling from real-world hypergraphs. In: Proceedings of the ACM web conference, pp 1080–1092 Choe M, Yoo J, Lee G, Baek W, Kang U, Shin K (2022) Midas: representative sampling from real-world hypergraphs. In: Proceedings of the ACM web conference, pp 1080–1092
Zurück zum Zitat Cong W, Forsati R, Kandemir M, Mahdavi M (2020) Minimal variance sampling with provable guarantees for fast training of graph neural networks. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pp 1393–1403 Cong W, Forsati R, Kandemir M, Mahdavi M (2020) Minimal variance sampling with provable guarantees for fast training of graph neural networks. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, pp 1393–1403
Zurück zum Zitat Fan W (2022) Big graphs: challenges and opportunities. Proc VLDB Endow 15(12):3782–3797CrossRef Fan W (2022) Big graphs: challenges and opportunities. Proc VLDB Endow 15(12):3782–3797CrossRef
Zurück zum Zitat Fan W, He T, Lai L, Li X, Li Y, Li Z, Qian Z, Tian C, Wang L, Xu J et al (2021) Graphscope: a unified engine for big graph processing. Proc VLDB Endow 14(12):2879–2892CrossRef Fan W, He T, Lai L, Li X, Li Y, Li Z, Qian Z, Tian C, Wang L, Xu J et al (2021) Graphscope: a unified engine for big graph processing. Proc VLDB Endow 14(12):2879–2892CrossRef
Zurück zum Zitat Gao H, Liu Y, Ji S (2021) Topology-aware graph pooling networks. IEEE Trans Pattern Anal Mach Intell 43(12):4512–4518CrossRefPubMed Gao H, Liu Y, Ji S (2021) Topology-aware graph pooling networks. IEEE Trans Pattern Anal Mach Intell 43(12):4512–4518CrossRefPubMed
Zurück zum Zitat Gjoka M, Kurant M, Butts CT, Markopoulou A (2010) Walking in facebook: a case study of unbiased sampling of osns. In: 2010 Proceedings IEEE Infocom. IEEE, pp 1–9 Gjoka M, Kurant M, Butts CT, Markopoulou A (2010) Walking in facebook: a case study of unbiased sampling of osns. In: 2010 Proceedings IEEE Infocom. IEEE, pp 1–9
Zurück zum Zitat Gonzalez JE, Xin RS, Dave A, Crankshaw D, Franklin MJ, Stoica I (2014) \(\{\)GraphX\(\}\): graph processing in a distributed dataflow framework. In: 11th USENIX symposium on operating systems design and implementation (OSDI 14), pp 599–613 Gonzalez JE, Xin RS, Dave A, Crankshaw D, Franklin MJ, Stoica I (2014) \(\{\)GraphX\(\}\): graph processing in a distributed dataflow framework. In: 11th USENIX symposium on operating systems design and implementation (OSDI 14), pp 599–613
Zurück zum Zitat Gove R (2019) A random sampling O (n) force-calculation algorithm for graph layouts. Computer graphics forum, vol 38. Wiley Online Library, pp 739–751 Gove R (2019) A random sampling O (n) force-calculation algorithm for graph layouts. Computer graphics forum, vol 38. Wiley Online Library, pp 739–751
Zurück zum Zitat Hagberg A, Swart P, Chult DS (2008) Exploring network structure, dynamics, and function using NetworkX. Tech. rep., Los Alamos National Lab. (LANL), Los Alamos, NM (United States) Hagberg A, Swart P, Chult DS (2008) Exploring network structure, dynamics, and function using NetworkX. Tech. rep., Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
Zurück zum Zitat Hoang L, Dathathri R, Gill G, Pingali K (2021) Cusp: a customizable streaming edge partitioner for distributed graph analytics. ACM SIGOPS Oper Syst Rev 55(1):47–60CrossRef Hoang L, Dathathri R, Gill G, Pingali K (2021) Cusp: a customizable streaming edge partitioner for distributed graph analytics. ACM SIGOPS Oper Syst Rev 55(1):47–60CrossRef
Zurück zum Zitat Hong SH, Lu S (2020) Graph sampling methods for big complex networks integrating centrality, k-core, and spectral sparsification. In: Proceedings of the 35th annual ACM symposium on applied computing, pp 1843–1851 Hong SH, Lu S (2020) Graph sampling methods for big complex networks integrating centrality, k-core, and spectral sparsification. In: Proceedings of the 35th annual ACM symposium on applied computing, pp 1843–1851
Zurück zum Zitat Hu Z, Zheng W, Lian X (2023) Triangular stability maximization by influence spread over social networks. Proc VLDB Endow 16(11):2818–2831CrossRef Hu Z, Zheng W, Lian X (2023) Triangular stability maximization by influence spread over social networks. Proc VLDB Endow 16(11):2818–2831CrossRef
Zurück zum Zitat Imola J, Murakami T, Chaudhuri K (2022) \(\{\)Communication-Efficient\(\}\) triangle counting under local differential privacy. In: 31st USENIX security symposium (USENIX Security 22), pp 537–554 Imola J, Murakami T, Chaudhuri K (2022) \(\{\)Communication-Efficient\(\}\) triangle counting under local differential privacy. In: 31st USENIX security symposium (USENIX Security 22), pp 537–554
Zurück zum Zitat Jangda A, Polisetty S, Guha A, Serafini M (2021) Accelerating graph sampling for graph machine learning using GPUs. In: Proceedings of the sixteenth European conference on computer systems, pp 311–326 Jangda A, Polisetty S, Guha A, Serafini M (2021) Accelerating graph sampling for graph machine learning using GPUs. In: Proceedings of the sixteenth European conference on computer systems, pp 311–326
Zurück zum Zitat Jin T, Li B, Li Y, Zhou Q, Ma Q, Zhao Y, Chen H, Cheng J (2023) Circinus: fast redundancy-reduced subgraph matching. Proc ACM Manag Data 1(1):1–26CrossRef Jin T, Li B, Li Y, Zhou Q, Ma Q, Zhao Y, Chen H, Cheng J (2023) Circinus: fast redundancy-reduced subgraph matching. Proc ACM Manag Data 1(1):1–26CrossRef
Zurück zum Zitat Leskovec J, Faloutsos C (2006) Sampling from large graphs. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 631–636 Leskovec J, Faloutsos C (2006) Sampling from large graphs. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 631–636
Zurück zum Zitat Li Y, Wu Z, Lin S, Xie H, Lv M, Xu Y, Lui JC (2019) Walking with perception: efficient random walk sampling via common neighbor awareness. In: 2019 IEEE 35th international conference on data engineering (ICDE), IEEE, pp 962–973 Li Y, Wu Z, Lin S, Xie H, Lv M, Xu Y, Lui JC (2019) Walking with perception: efficient random walk sampling via common neighbor awareness. In: 2019 IEEE 35th international conference on data engineering (ICDE), IEEE, pp 962–973
Zurück zum Zitat Liu P, Benson AR, Charikar M (2019) Sampling methods for counting temporal motifs. In: Proceedings of the twelfth ACM international conference on web search and data mining, pp 294–302 Liu P, Benson AR, Charikar M (2019) Sampling methods for counting temporal motifs. In: Proceedings of the twelfth ACM international conference on web search and data mining, pp 294–302
Zurück zum Zitat Mariappan M, Che J, Vora K (2021) Dzig: sparsity-aware incremental processing of streaming graphs. In: Proceedings of the sixteenth European conference on computer systems, pp 83–98 Mariappan M, Che J, Vora K (2021) Dzig: sparsity-aware incremental processing of streaming graphs. In: Proceedings of the sixteenth European conference on computer systems, pp 83–98
Zurück zum Zitat Nakajima K, Shudo K (2022) Social graph restoration via random walk sampling. In: 2022 IEEE 38th international conference on data engineering (ICDE). IEEE, pp 1–14 Nakajima K, Shudo K (2022) Social graph restoration via random walk sampling. In: 2022 IEEE 38th international conference on data engineering (ICDE). IEEE, pp 1–14
Zurück zum Zitat Nguyen D, Lenharth A, Pingali K (2013) A lightweight infrastructure for graph analytics. In: Proceedings of the twenty-fourth ACM symposium on operating systems principles, pp 456–471 Nguyen D, Lenharth A, Pingali K (2013) A lightweight infrastructure for graph analytics. In: Proceedings of the twenty-fourth ACM symposium on operating systems principles, pp 456–471
Zurück zum Zitat Pandey P, Wheatman B, Xu H, Buluc A (2021) Terrace: a hierarchical graph container for skewed dynamic graphs. In: Proceedings of the 2021 international conference on management of data, pp 1372–1385 Pandey P, Wheatman B, Xu H, Buluc A (2021) Terrace: a hierarchical graph container for skewed dynamic graphs. In: Proceedings of the 2021 international conference on management of data, pp 1372–1385
Zurück zum Zitat Preti G, De Francisci MG, Riondato M (2023) Maniacs: approximate mining of frequent subgraph patterns through sampling. ACM Trans Intell Syst Technol 14(3):1–29CrossRef Preti G, De Francisci MG, Riondato M (2023) Maniacs: approximate mining of frequent subgraph patterns through sampling. ACM Trans Intell Syst Technol 14(3):1–29CrossRef
Zurück zum Zitat Rozemberczki B, Kiss O, Sarkar R (2020) Little ball of fur: a python library for graph sampling. In: Proceedings of the 29th ACM international conference on information & knowledge management, pp 3133–3140 Rozemberczki B, Kiss O, Sarkar R (2020) Little ball of fur: a python library for graph sampling. In: Proceedings of the 29th ACM international conference on information & knowledge management, pp 3133–3140
Zurück zum Zitat Sahu S, Mhedhbi A, Salihoglu S, Lin J, Özsu MT (2020) The ubiquity of large graphs and surprising challenges of graph processing: extended survey. VLDB J 29:595–618CrossRef Sahu S, Mhedhbi A, Salihoglu S, Lin J, Özsu MT (2020) The ubiquity of large graphs and surprising challenges of graph processing: extended survey. VLDB J 29:595–618CrossRef
Zurück zum Zitat Sakr S, Bonifati A, Voigt H, Iosup A, Ammar K, Angles R, Aref W, Arenas M, Besta M, Boncz PA et al (2021) The future is big graphs: a community view on graph processing systems. Commun ACM 64(9):62–71CrossRef Sakr S, Bonifati A, Voigt H, Iosup A, Ammar K, Angles R, Aref W, Arenas M, Besta M, Boncz PA et al (2021) The future is big graphs: a community view on graph processing systems. Commun ACM 64(9):62–71CrossRef
Zurück zum Zitat Shin K, Oh S, Kim J, Hooi B, Faloutsos C (2020) Fast, accurate and provable triangle counting in fully dynamic graph streams. ACM Trans Knowl Discov Data (TKDD) 14(2):1–39CrossRef Shin K, Oh S, Kim J, Hooi B, Faloutsos C (2020) Fast, accurate and provable triangle counting in fully dynamic graph streams. ACM Trans Knowl Discov Data (TKDD) 14(2):1–39CrossRef
Zurück zum Zitat Staudt CL, Sazonovs A, Meyerhenke H (2016) Networkit: a tool suite for large-scale complex network analysis. Netw Sci 4(4):508–530CrossRef Staudt CL, Sazonovs A, Meyerhenke H (2016) Networkit: a tool suite for large-scale complex network analysis. Netw Sci 4(4):508–530CrossRef
Zurück zum Zitat Swift IP, Ebrahimi S, Nova A, Asudeh A (2022) Maximizing fair content spread via edge suggestion in social networks. Proc VLDB Endow 15(11):2692–2705CrossRef Swift IP, Ebrahimi S, Nova A, Asudeh A (2022) Maximizing fair content spread via edge suggestion in social networks. Proc VLDB Endow 15(11):2692–2705CrossRef
Zurück zum Zitat Tan Q, Zhang J, Yao J, Liu N, Zhou J, Yang H, Hu X (2021) Sparse-interest network for sequential recommendation. In: Proceedings of the 14th ACM international conference on web search and data mining, pp 598–606 Tan Q, Zhang J, Yao J, Liu N, Zhou J, Yang H, Hu X (2021) Sparse-interest network for sequential recommendation. In: Proceedings of the 14th ACM international conference on web search and data mining, pp 598–606
Zurück zum Zitat Tětek J, Thorup M (2022) Edge sampling and graph parameter estimation via vertex neighborhood accesses. In: Proceedings of the 54th annual ACM SIGACT symposium on theory of computing, pp 1116–1129 Tětek J, Thorup M (2022) Edge sampling and graph parameter estimation via vertex neighborhood accesses. In: Proceedings of the 54th annual ACM SIGACT symposium on theory of computing, pp 1116–1129
Zurück zum Zitat Trolliet T, Cohen N, Giroire F, Hogie L, Pérennes S (2022) Interest clustering coefficient: a new metric for directed networks like twitter. J Complex Netw 10(1):cnab030MathSciNetCrossRef Trolliet T, Cohen N, Giroire F, Hogie L, Pérennes S (2022) Interest clustering coefficient: a new metric for directed networks like twitter. J Complex Netw 10(1):cnab030MathSciNetCrossRef
Zurück zum Zitat Van Koevering K, Benson A, Kleinberg J (2021) Random graphs with prescribed k-core sequences: a new null model for network analysis. In: Proceedings of the web conference, pp 367–378 Van Koevering K, Benson A, Kleinberg J (2021) Random graphs with prescribed k-core sequences: a new null model for network analysis. In: Proceedings of the web conference, pp 367–378
Zurück zum Zitat Wan C, Li Y, Li A, Kim NS, Lin Y (2022) Bns-gcn: efficient full-graph training of graph convolutional networks with partition-parallelism and random boundary node sampling. Proc Mach Learn Syst 4:673–693 Wan C, Li Y, Li A, Kim NS, Lin Y (2022) Bns-gcn: efficient full-graph training of graph convolutional networks with partition-parallelism and random boundary node sampling. Proc Mach Learn Syst 4:673–693
Zurück zum Zitat Yang C, Buluç A, Owens JD (2022) Graphblast: a high-performance linear algebra-based graph framework on the gpu. ACM Trans Math Softw (TOMS) 48(1):1–51MathSciNetCrossRef Yang C, Buluç A, Owens JD (2022) Graphblast: a high-performance linear algebra-based graph framework on the gpu. ACM Trans Math Softw (TOMS) 48(1):1–51MathSciNetCrossRef
Zurück zum Zitat Yang K, Zhang M, Chen K, Ma X, Bai Y, Jiang Y (2019) Knightking: a fast distributed graph random walk engine. In: Proceedings of the 27th ACM symposium on operating systems principles, pp 524–537 Yang K, Zhang M, Chen K, Ma X, Bai Y, Jiang Y (2019) Knightking: a fast distributed graph random walk engine. In: Proceedings of the 27th ACM symposium on operating systems principles, pp 524–537
Zurück zum Zitat You J, Leskovec J, He K, Xie S (2020) Graph structure of neural networks. In: International conference on machine learning. PMLR, pp 10,881–10,891 You J, Leskovec J, He K, Xie S (2020) Graph structure of neural networks. In: International conference on machine learning. PMLR, pp 10,881–10,891
Zurück zum Zitat Zeng H, Zhou H, Srivastava A, Kannan R, Prasanna V (2020) Graphsaint: graph sampling based inductive learning method Zeng H, Zhou H, Srivastava A, Kannan R, Prasanna V (2020) Graphsaint: graph sampling based inductive learning method
Zurück zum Zitat Zhang Z, Liu Q, Hu Q, Lee CK (2022) Hierarchical graph transformer with adaptive node sampling. Adv Neural Inf Process Syst 35:21171–21183 Zhang Z, Liu Q, Hu Q, Lee CK (2022) Hierarchical graph transformer with adaptive node sampling. Adv Neural Inf Process Syst 35:21171–21183
Zurück zum Zitat Zhao Y, Jiang H, Qin Y, Xie H, Wu Y, Liu S, Zhou Z, Xia J, Zhou F et al (2020) Preserving minority structures in graph sampling. IEEE Trans Vis Comput Graph 27(2):1698–1708CrossRef Zhao Y, Jiang H, Qin Y, Xie H, Wu Y, Liu S, Zhou Z, Xia J, Zhou F et al (2020) Preserving minority structures in graph sampling. IEEE Trans Vis Comput Graph 27(2):1698–1708CrossRef
Zurück zum Zitat Zheng C, Zong B, Cheng W, Song D, Ni J, Yu W, Chen H, Wang W (2020) Robust graph representation learning via neural sparsification. In: International conference on machine learning. PMLR, pp 11458–11468 Zheng C, Zong B, Cheng W, Song D, Ni J, Yu W, Chen H, Wang W (2020) Robust graph representation learning via neural sparsification. In: International conference on machine learning. PMLR, pp 11458–11468
Zurück zum Zitat Zhu Z, Wu K, Liu Z (2023) Arya: arbitrary graph pattern mining with decomposition-based sampling. In: 20th USENIX symposium on networked systems design and implementation (NSDI 23), pp 1013–1030 Zhu Z, Wu K, Liu Z (2023) Arya: arbitrary graph pattern mining with decomposition-based sampling. In: 20th USENIX symposium on networked systems design and implementation (NSDI 23), pp 1013–1030
Metadaten
Titel
An adaptive graph sampling framework for graph analytics
verfasst von
Kewen Wang
Publikationsdatum
01.12.2024
Verlag
Springer Vienna
Erschienen in
Social Network Analysis and Mining / Ausgabe 1/2024
Print ISSN: 1869-5450
Elektronische ISSN: 1869-5469
DOI
https://doi.org/10.1007/s13278-023-01157-x

Weitere Artikel der Ausgabe 1/2024

Social Network Analysis and Mining 1/2024 Zur Ausgabe

Premium Partner