Skip to main content
Top

2018 | OriginalPaper | Chapter

SSRW: A Scalable Algorithm for Estimating Graphlet Statistics Based on Random Walk

Authors : Chen Yang, Min Lyu, Yongkun Li, Qianqian Zhao, Yinlong Xu

Published in: Database Systems for Advanced Applications

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Mining graphlet statistics is very meaningful due to its wide applications in social networks, bioinformatics and information security, etc. However, it is a big challenge to exactly count graphlet statistics as the number of subgraphs exponentially increases with the graph size, so sampling algorithms are widely used to estimate graphlet statistics within reasonable time. However, existing sampling algorithms are not scalable for large graphlets, e.g., they may get stuck when estimating graphlets with more than five nodes. To address this issue, we propose a highly scalable algorithm, Scalable subgraph Sampling via Random Walk (SSRW), for graphlet counts and concentrations. SSRW samples graphlets by generating new nodes from the neighbors of previously visited nodes instead of fixed ones. Thanks to this flexibility, we can generate any k-graphlets in a unified way and estimate statistics of k-graphlet efficiently even for large k. Our extensive experiments on estimating counts and concentrations of \(\{4,5,6,7\}\)-graphlets show that SSRW algorithm is scalable, accurate and fast.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
2.
go back to reference Ahmed, N.K., Neville, J., Rossi, R.A., Duffield, N.G., Willke, T.L.: Graphlet decomposition: framework, algorithms, and applications. Knowl. Inf. Syst. 50, 1–34 (2016) Ahmed, N.K., Neville, J., Rossi, R.A., Duffield, N.G., Willke, T.L.: Graphlet decomposition: framework, algorithms, and applications. Knowl. Inf. Syst. 50, 1–34 (2016)
3.
go back to reference Lovász, L.: Random walks on graphs: a survey. Combinatorics: Paul Erdös Is Eighty 2(1), 1–46 (1993) Lovász, L.: Random walks on graphs: a survey. Combinatorics: Paul Erdös Is Eighty 2(1), 1–46 (1993)
4.
go back to reference Bhuiyan, M.A., Rahman, M., Al Hasan, M.: Guise: uniform sampling of graphlets for large graph analysis. In: ICDM. IEEE (2012) Bhuiyan, M.A., Rahman, M., Al Hasan, M.: Guise: uniform sampling of graphlets for large graph analysis. In: ICDM. IEEE (2012)
5.
go back to reference Chen, X., Li, Y., Wang, P., Lui, J.: A general framework for estimating graphlet statistics via random walk. VLDB 10(3), 253–264 (2016) Chen, X., Li, Y., Wang, P., Lui, J.: A general framework for estimating graphlet statistics via random walk. VLDB 10(3), 253–264 (2016)
6.
go back to reference Chen, X., Lui, J.C.: Mining graphlet counts in online social networks. In: ICDM. IEEE (2016) Chen, X., Lui, J.C.: Mining graphlet counts in online social networks. In: ICDM. IEEE (2016)
7.
go back to reference Han, G., Sethu, H.: Waddling random walk: fast and accurate mining of motif statistics in large graphs. In: ICDM. IEEE (2016) Han, G., Sethu, H.: Waddling random walk: fast and accurate mining of motif statistics in large graphs. In: ICDM. IEEE (2016)
8.
go back to reference Hardiman, S.J., Katzir, L.: Estimating clustering coefficients and size of social networks via random walk. In: WWW. ACM (2013) Hardiman, S.J., Katzir, L.: Estimating clustering coefficients and size of social networks via random walk. In: WWW. ACM (2013)
9.
go back to reference Holland, P.W., Leinhardt, S.: A method for detecting structure in sociometric data. Am. J. Sociol. 76(3), 492–513 (1970)CrossRef Holland, P.W., Leinhardt, S.: A method for detecting structure in sociometric data. Am. J. Sociol. 76(3), 492–513 (1970)CrossRef
10.
go back to reference Jha, M., Seshadhri, C., Pinar, A.: Path sampling: a fast and provable method for estimating 4-vertex subgraph counts. In: WWW. ACM (2015) Jha, M., Seshadhri, C., Pinar, A.: Path sampling: a fast and provable method for estimating 4-vertex subgraph counts. In: WWW. ACM (2015)
11.
go back to reference Lee, C.-H., Xu, X., Eun, D.Y.: Beyond random walk and metropolis-hastings samplers: why you should not backtrack for unbiased graph sampling. In: SIGMETRICS (2012) Lee, C.-H., Xu, X., Eun, D.Y.: Beyond random walk and metropolis-hastings samplers: why you should not backtrack for unbiased graph sampling. In: SIGMETRICS (2012)
13.
go back to reference Lim, Y., Kang, U.: Mascot: memory-efficient and accurate sampling for counting local triangles in graph streams. In: KDD (2015) Lim, Y., Kang, U.: Mascot: memory-efficient and accurate sampling for counting local triangles in graph streams. In: KDD (2015)
14.
go back to reference Marcus, D., Shavitt, Y.: RAGE-a rapid graphlet enumerator for large networks. Comput. Netw. 56(2), 810–819 (2012)CrossRef Marcus, D., Shavitt, Y.: RAGE-a rapid graphlet enumerator for large networks. Comput. Netw. 56(2), 810–819 (2012)CrossRef
15.
go back to reference Milenkovic, T., Przulj, N.: Uncovering biological network function via graphlet degree signatures. arXiv preprint arXiv:0802.0556 (2008) Milenkovic, T., Przulj, N.: Uncovering biological network function via graphlet degree signatures. arXiv preprint arXiv:​0802.​0556 (2008)
16.
go back to reference Mislove, A., Marcon, M., Gummadi, K.P., Druschel, P., Bhattacharjee, B.: Measurement and analysis of online social networks. In: SIGCOMM. ACM (2007) Mislove, A., Marcon, M., Gummadi, K.P., Druschel, P., Bhattacharjee, B.: Measurement and analysis of online social networks. In: SIGCOMM. ACM (2007)
17.
go back to reference Peng, W., Gao, T., Sisodia, D., Saha, T.K., Li, F., Al Hasan, M.: ACTS: extracting android app topological signature through graphlet sampling. In: 2016 IEEE Conference on Communications and Network Security (CNS), pp. 37–45. IEEE (2016) Peng, W., Gao, T., Sisodia, D., Saha, T.K., Li, F., Al Hasan, M.: ACTS: extracting android app topological signature through graphlet sampling. In: 2016 IEEE Conference on Communications and Network Security (CNS), pp. 37–45. IEEE (2016)
18.
19.
go back to reference Pržulj, N., Corneil, D.G., Jurisica, I.: Modeling interactome: scale-free or geometric? Bioinformatics 20(18), 3508–3515 (2004)CrossRef Pržulj, N., Corneil, D.G., Jurisica, I.: Modeling interactome: scale-free or geometric? Bioinformatics 20(18), 3508–3515 (2004)CrossRef
20.
go back to reference Rahman, M., Bhuiyan, M.A., Al Hasan, M.: Graft: an efficient graphlet counting method for large graph analysis. TKDE 26(10), 2466–2478 (2014) Rahman, M., Bhuiyan, M.A., Al Hasan, M.: Graft: an efficient graphlet counting method for large graph analysis. TKDE 26(10), 2466–2478 (2014)
21.
go back to reference Ribeiro, P., Silva, F.: G-tries: an efficient data structure for discovering network motifs. In: Proceedings of the 2010 ACM Symposium on Applied Computing, pp. 1559–1566. ACM (2010) Ribeiro, P., Silva, F.: G-tries: an efficient data structure for discovering network motifs. In: Proceedings of the 2010 ACM Symposium on Applied Computing, pp. 1559–1566. ACM (2010)
24.
go back to reference Shervashidze, N., Vishwanathan, S., Petri, T., Mehlhorn, K., Borgwardt, K.: Efficient graphlet kernels for large graph comparison. In: Artificial Intelligence and Statistics, pp. 488–495 (2009) Shervashidze, N., Vishwanathan, S., Petri, T., Mehlhorn, K., Borgwardt, K.: Efficient graphlet kernels for large graph comparison. In: Artificial Intelligence and Statistics, pp. 488–495 (2009)
25.
go back to reference Wang, P., Lui, J., Ribeiro, B., Towsley, D., Zhao, J., Guan, X.: Efficiently estimating motif statistics of large networks. TKDD 9(2), 8 (2014)CrossRef Wang, P., Lui, J., Ribeiro, B., Towsley, D., Zhao, J., Guan, X.: Efficiently estimating motif statistics of large networks. TKDD 9(2), 8 (2014)CrossRef
26.
go back to reference Wang, P., Zhao, J., Zhang, X., Li, Z., Cheng, J., Lui, J.C., Towsley, D., Tao, J., Guan, X.: MOSS-5: a fast method of approximating counts of 5-node graphlets in large graphs. TKDE 30, 73–86 (2017) Wang, P., Zhao, J., Zhang, X., Li, Z., Cheng, J., Lui, J.C., Towsley, D., Tao, J., Guan, X.: MOSS-5: a fast method of approximating counts of 5-node graphlets in large graphs. TKDE 30, 73–86 (2017)
Metadata
Title
SSRW: A Scalable Algorithm for Estimating Graphlet Statistics Based on Random Walk
Authors
Chen Yang
Min Lyu
Yongkun Li
Qianqian Zhao
Yinlong Xu
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-319-91452-7_18

Premium Partner