Skip to main content

2017 | OriginalPaper | Buchkapitel

Efficient Local Clustering Coefficient Estimation in Massive Graphs

verfasst von : Hao Zhang, Yuanyuan Zhu, Lu Qin, Hong Cheng, Jeffrey Xu Yu

Erschienen in: Database Systems for Advanced Applications

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Graph is a powerful tool to model interactions in disparate applications, and how to assess the structure of a graph is an essential task across all the domains. As a classic measure to characterize the connectivity of graphs, clustering coefficient and its variants are of particular interest in graph structural analysis. However, the largest of today’s graphs may have nodes and edges in billion scale, which makes the simple task of computing clustering coefficients quite complicated and expensive. Thus, approximate solutions have attracted much attention from researchers recently. However, they only target global and binned degree-wise clustering coefficient estimation, and their techniques are not suitable for local clustering coefficient estimation that is of great importance for individual nodes. In this paper, we propose a new sampling scheme to estimate the local clustering coefficient with error bounded, where global and binned degree-wise clustering coefficients can be considered as special cases. Meanwhile, based on our sampling scheme, we propose a new framework to estimate all the three clustering coefficients in a unified way. To make it scalable on massive graphs, we further design an efficient MapReduce algorithm under this framework. Extensive experiments validate the efficiency and effectiveness of our algorithms, which significantly outperform state-of-the-art exact and approximate algorithms on many real graph datasets.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Becchetti, L., Boldi, P., Castillo, C., Gionis, A.: Efficient algorithms for large-scale local triangle counting. TKDD 4(3) (2010). Article no. 13 Becchetti, L., Boldi, P., Castillo, C., Gionis, A.: Efficient algorithms for large-scale local triangle counting. TKDD 4(3) (2010). Article no. 13
2.
Zurück zum Zitat Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)CrossRefMATH Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)CrossRefMATH
3.
Zurück zum Zitat Chen, D.-B., Gao, H., Lü, L., Zhou, T.: Identifying influential nodes in large-scale directed networks: the role of clustering. PloS one 8(10), e77455 (2013)CrossRef Chen, D.-B., Gao, H., Lü, L., Zhou, T.: Identifying influential nodes in large-scale directed networks: the role of clustering. PloS one 8(10), e77455 (2013)CrossRef
4.
Zurück zum Zitat Chu, S., Cheng, J.: Triangle listing in massive networks and its applications. In: KDD, pp. 672–680. ACM (2011) Chu, S., Cheng, J.: Triangle listing in massive networks and its applications. In: KDD, pp. 672–680. ACM (2011)
5.
Zurück zum Zitat Cohen, J.: Graph twiddling in a mapreduce world. Comput. Sci. Eng. 11(4), 29–41 (2009)CrossRef Cohen, J.: Graph twiddling in a mapreduce world. Comput. Sci. Eng. 11(4), 29–41 (2009)CrossRef
6.
Zurück zum Zitat Eckmann, J.-P., Moses, E.: Curvature of co-links uncovers hidden thematic layers in the world wide web. PNAS 99(9), 5825–5829 (2002)MathSciNetCrossRef Eckmann, J.-P., Moses, E.: Curvature of co-links uncovers hidden thematic layers in the world wide web. PNAS 99(9), 5825–5829 (2002)MathSciNetCrossRef
7.
8.
Zurück zum Zitat Hu, X., Tao, Y., Chung, C.-W.: Massive graph triangulation. In: SIGMOD, pp. 325–336. ACM (2013) Hu, X., Tao, Y., Chung, C.-W.: Massive graph triangulation. In: SIGMOD, pp. 325–336. ACM (2013)
9.
Zurück zum Zitat Jha, M., Seshadhri, C., Pinar, A.: A space-efficient streaming algorithm for estimating transitivity and triangle counts using the birthday paradox. TKDD 9(3), 15:1–15:21 (2015) Jha, M., Seshadhri, C., Pinar, A.: A space-efficient streaming algorithm for estimating transitivity and triangle counts using the birthday paradox. TKDD 9(3), 15:1–15:21 (2015)
10.
Zurück zum Zitat Kolda, T.G., Pinar, A., Plantenga, T., Seshadhri, C., Task, C.: Counting triangles in massive graphs with mapreduce. SISC 36(5), S48–S77 (2014)MathSciNetCrossRefMATH Kolda, T.G., Pinar, A., Plantenga, T., Seshadhri, C., Task, C.: Counting triangles in massive graphs with mapreduce. SISC 36(5), S48–S77 (2014)MathSciNetCrossRefMATH
11.
Zurück zum Zitat Kolountzakis, M.N., Miller, G.L., Peng, R., Tsourakakis, C.E.: Efficient triangle counting in large graphs via degree-based vertex partitioning. Internet Math. 8(1–2), 161–185 (2012)MathSciNetCrossRefMATH Kolountzakis, M.N., Miller, G.L., Peng, R., Tsourakakis, C.E.: Efficient triangle counting in large graphs via degree-based vertex partitioning. Internet Math. 8(1–2), 161–185 (2012)MathSciNetCrossRefMATH
12.
Zurück zum Zitat Kwak, H., Lee, C., Park, H., Moon, S.: What is Twitter, a social network or a news media? In: WWW 2010: Proceedings of the 19th International Conference on World wide web, pp. 591–600. ACM, New York (2010) Kwak, H., Lee, C., Park, H., Moon, S.: What is Twitter, a social network or a news media? In: WWW 2010: Proceedings of the 19th International Conference on World wide web, pp. 591–600. ACM, New York (2010)
13.
Zurück zum Zitat Latapy, M.: Main-memory triangle computations for very large (sparse (power-law)) graphs. Theoret. Comput. Sci. 407(1), 458–473 (2008)MathSciNetCrossRefMATH Latapy, M.: Main-memory triangle computations for very large (sparse (power-law)) graphs. Theoret. Comput. Sci. 407(1), 458–473 (2008)MathSciNetCrossRefMATH
15.
Zurück zum Zitat Lim, Y., Kang, U.: MASCOT: memory-efficient and accurate sampling for counting local triangles in graph streams. In: KDD, pp. 685–694 (2015) Lim, Y., Kang, U.: MASCOT: memory-efficient and accurate sampling for counting local triangles in graph streams. In: KDD, pp. 685–694 (2015)
16.
Zurück zum Zitat Lin, Y., Xiong, H., Chen, M., Ding, L., Cao, Y., Wang, G., Liu, M.: Dynamical model and analysis of cascading failures on the complex power grids. Kybernetes 40(5/6), 814–823 (2011)CrossRef Lin, Y., Xiong, H., Chen, M., Ding, L., Cao, Y., Wang, G., Liu, M.: Dynamical model and analysis of cascading failures on the complex power grids. Kybernetes 40(5/6), 814–823 (2011)CrossRef
17.
Zurück zum Zitat Masuda, N.: Clustering in large networks does not promote upstream reciprocity. PloS one 6(10), e25190 (2011)CrossRef Masuda, N.: Clustering in large networks does not promote upstream reciprocity. PloS one 6(10), e25190 (2011)CrossRef
19.
Zurück zum Zitat Menegola, B.: An external memory algorithm for listing triangles (2010) Menegola, B.: An external memory algorithm for listing triangles (2010)
20.
Zurück zum Zitat Mitzenmacher, M., Upfal, E.: Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press, New York (2005) Mitzenmacher, M., Upfal, E.: Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press, New York (2005)
21.
Zurück zum Zitat Pagh, R., Tsourakakis, C.E.: Colorful triangle counting and a mapreduce implementation. Inf. Process. Lett. 112(7), 277–281 (2012)MathSciNetCrossRefMATH Pagh, R., Tsourakakis, C.E.: Colorful triangle counting and a mapreduce implementation. Inf. Process. Lett. 112(7), 277–281 (2012)MathSciNetCrossRefMATH
22.
Zurück zum Zitat Park, H.-M., Chung, C.-W.: An efficient mapreduce algorithm for counting triangles in a very large graph. In: CIKM, pp. 539–548. ACM (2013) Park, H.-M., Chung, C.-W.: An efficient mapreduce algorithm for counting triangles in a very large graph. In: CIKM, pp. 539–548. ACM (2013)
23.
Zurück zum Zitat Park, H.-M., Silvestri, F., Kang, U., Pagh, R.: Mapreduce triangle enumeration with guarantees. In: CIKM, pp. 1739–1748. ACM (2014) Park, H.-M., Silvestri, F., Kang, U., Pagh, R.: Mapreduce triangle enumeration with guarantees. In: CIKM, pp. 1739–1748. ACM (2014)
24.
Zurück zum Zitat Schank, T., Wagner, D.: Approximating clustering coefficient and transitivity. J. Graph Algorithms Appl. 9(2), 265–275 (2005)MathSciNetCrossRefMATH Schank, T., Wagner, D.: Approximating clustering coefficient and transitivity. J. Graph Algorithms Appl. 9(2), 265–275 (2005)MathSciNetCrossRefMATH
25.
Zurück zum Zitat Serfling, R.J.: Probability inequalities for the sum in sampling without replacement. Ann. Stat. 2(1), 39–48 (1974) Serfling, R.J.: Probability inequalities for the sum in sampling without replacement. Ann. Stat. 2(1), 39–48 (1974)
26.
Zurück zum Zitat Seshadhri, C., Kolda, T.G., Pinar, A.: Community structure and scale-free collections of erdős-rényi graphs. Phys. Rev. E 85(5), 056109 (2012)CrossRef Seshadhri, C., Kolda, T.G., Pinar, A.: Community structure and scale-free collections of erdős-rényi graphs. Phys. Rev. E 85(5), 056109 (2012)CrossRef
27.
Zurück zum Zitat Seshadhri, C., Pinar, A., Kolda, T.G.: Fast triangle counting through wedge sampling. In: SDM, vol. 4, p. 5. Citeseer (2013) Seshadhri, C., Pinar, A., Kolda, T.G.: Fast triangle counting through wedge sampling. In: SDM, vol. 4, p. 5. Citeseer (2013)
28.
Zurück zum Zitat Seshadhri, C., Pinar, A., Kolda, T.G.: Triadic measures on graphs: the power of wedge sampling. In: SDM, pp. 10–18. SIAM (2013) Seshadhri, C., Pinar, A., Kolda, T.G.: Triadic measures on graphs: the power of wedge sampling. In: SDM, pp. 10–18. SIAM (2013)
29.
Zurück zum Zitat Stefani, L.D., Epasto, A., Riondato, M., Upfal, E.: Trièst: counting local and global triangles in fully-dynamic streams with fixed memory size. In: KDD, pp. 825–834 (2016) Stefani, L.D., Epasto, A., Riondato, M., Upfal, E.: Trièst: counting local and global triangles in fully-dynamic streams with fixed memory size. In: KDD, pp. 825–834 (2016)
30.
Zurück zum Zitat Suri, S., Vassilvitskii, S.: Counting triangles and the curse of the last reducer. In: WWW, pp. 607–614. ACM (2011) Suri, S., Vassilvitskii, S.: Counting triangles and the curse of the last reducer. In: WWW, pp. 607–614. ACM (2011)
31.
Zurück zum Zitat Trpevski, D., Tang, W.K., Kocarev, L.: Model for rumor spreading over networks. Phys. Rev. E 81(5), 056102 (2010)CrossRef Trpevski, D., Tang, W.K., Kocarev, L.: Model for rumor spreading over networks. Phys. Rev. E 81(5), 056102 (2010)CrossRef
32.
Zurück zum Zitat Tsourakakis, C.E.: Fast counting of triangles in large real networks without counting: algorithms and laws. In: ICDM, pp 608–617 (2008) Tsourakakis, C.E.: Fast counting of triangles in large real networks without counting: algorithms and laws. In: ICDM, pp 608–617 (2008)
33.
Zurück zum Zitat Tsourakakis, C.E., Drineas, P., Michelakis, E., Koutis, I., Faloutsos, C.: Spectral counting of triangles via element-wise sparsification and triangle-based link recommendation. Soc. Netw. Anal. Mining 1(2), 75–81 (2011)CrossRef Tsourakakis, C.E., Drineas, P., Michelakis, E., Koutis, I., Faloutsos, C.: Spectral counting of triangles via element-wise sparsification and triangle-based link recommendation. Soc. Netw. Anal. Mining 1(2), 75–81 (2011)CrossRef
34.
Zurück zum Zitat Tsourakakis, C.E., Kang, U., Miller, G.L., Faloutsos, C.: Doulion: counting triangles in massive graphs with a coin. In: KDD, pp. 837–846. ACM (2009) Tsourakakis, C.E., Kang, U., Miller, G.L., Faloutsos, C.: Doulion: counting triangles in massive graphs with a coin. In: KDD, pp. 837–846. ACM (2009)
35.
36.
Zurück zum Zitat Wu, X., Lu, H.: Cluster synchronization in the adaptive complex dynamical networks via a novel approach. Phys. Lett. A 375(14), 1559–1565 (2011)CrossRefMATH Wu, X., Lu, H.: Cluster synchronization in the adaptive complex dynamical networks via a novel approach. Phys. Lett. A 375(14), 1559–1565 (2011)CrossRefMATH
37.
Zurück zum Zitat Yang, Z., Wilson, C., Wang, X., Gao, T., Zhao, B.Y., Dai, Y.: Uncovering social network sybils in the wild. TKDD 8(1), 2 (2014)CrossRef Yang, Z., Wilson, C., Wang, X., Gao, T., Zhao, B.Y., Dai, Y.: Uncovering social network sybils in the wild. TKDD 8(1), 2 (2014)CrossRef
Metadaten
Titel
Efficient Local Clustering Coefficient Estimation in Massive Graphs
verfasst von
Hao Zhang
Yuanyuan Zhu
Lu Qin
Hong Cheng
Jeffrey Xu Yu
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-55699-4_23