Skip to main content
Erschienen in: The VLDB Journal 6/2020

12.08.2020 | Regular Paper

Temporal locality-aware sampling for accurate triangle counting in real graph streams

verfasst von: Dongjin Lee, Kijung Shin, Christos Faloutsos

Erschienen in: The VLDB Journal | Ausgabe 6/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

If we cannot store all edges in a dynamic graph, which edges should we store to estimate the triangle count accurately? Counting triangles (i.e., cliques of size three) is a fundamental graph problem with many applications in social network analysis, web mining, anomaly detection, etc. Recently, much effort has been made to accurately estimate the counts of global triangles (i.e., all triangles) and local triangles (i.e., all triangle incident to each node) in large dynamic graphs, especially with limited space. Although existing algorithms use sampling techniques without considering temporal dependencies in edges, we observe temporal locality in the formation of triangles in real dynamic graphs. That is, future edges are more likely to form triangles with recent edges than with older edges. In this work, we propose a family of single-pass streaming algorithms called Waiting-Room Sampling (WRS) for estimating the counts of global and local triangles in a fully dynamic graph, where edges are inserted and deleted over time, within a fixed memory budget. WRS exploits the temporal locality by always storing the most recent edges, which future edges are more likely to form triangles with, in the waiting room, while it uses reservoir sampling and its variant for the remaining edges. Our theoretical and empirical analyses show that WRS is: (a) Fast and ‘any time’: runs in linear time, always maintaining and updating estimates, while the input graph evolves, (b) Effective: yields up to 47% smaller estimation error than its best competitors, and (c) Theoretically sound: gives unbiased estimates with small variances under the temporal locality.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
A preliminary version of WRS for insertion-only graph streams was presented in [35]. This work is an extended version of [35] with (a) a new algorithm for fully dynamic graph streams, (b) theoretical analyses on its accuracy and complexity, and (c) additional experiments with more datasets, competitors, and evaluation metrics.
 
2
That is, for each measure, we computed the measure of the estimates on each trial, and then, we reported the mean of the computed values. We did not report each measure of the mean of the estimates obtained from all trials.
 
3
For Mascot and \({\textsc {ThinkD}}_{\textsc {FAST}}\), we set the sampling probability p so that the expected number of sampled edges is equal to the memory budget.
 
4
All the considered algorithms were always better than setting all estimates to zero in terms of global error and rank correlation. However, all the considered algorithms were not in terms local error when the memory budget k was extremely small, and thus, the variances of estimates of local triangle counts were very large.
 
5
Specifically, we ran the Knuth shuffle [22] while skipping different fractions of iterations.
 
Literatur
1.
Zurück zum Zitat Ahmed, N.K., Duffield, N., Neville, J., Kompella, R.: Graph sample and hold: a framework for big-graph analytics. In: KDD, pp. 1446–1455 (2014) Ahmed, N.K., Duffield, N., Neville, J., Kompella, R.: Graph sample and hold: a framework for big-graph analytics. In: KDD, pp. 1446–1455 (2014)
2.
Zurück zum Zitat Ahmed, N.K., Duffield, N., Willke, T.L., Rossi, R.A.: On sampling from massive graph streams. PVLDB 10(11), 1430–1441 (2017) Ahmed, N.K., Duffield, N., Willke, T.L., Rossi, R.A.: On sampling from massive graph streams. PVLDB 10(11), 1430–1441 (2017)
3.
Zurück zum Zitat Arifuzzaman, S., Khan, M., Marathe, M.: Patric: A parallel algorithm for counting triangles in massive networks. In: CIKM, pp. 529–538 (2013) Arifuzzaman, S., Khan, M., Marathe, M.: Patric: A parallel algorithm for counting triangles in massive networks. In: CIKM, pp. 529–538 (2013)
4.
Zurück zum Zitat Bar-Yossef, Z., Kumar, R., Sivakumar, D.: Reductions in streaming algorithms, with an application to counting triangles in graphs. In: SODA, pp. 623–632 (2002) Bar-Yossef, Z., Kumar, R., Sivakumar, D.: Reductions in streaming algorithms, with an application to counting triangles in graphs. In: SODA, pp. 623–632 (2002)
5.
6.
Zurück zum Zitat Becchetti, L., Boldi, P., Castillo, C., Gionis, A.: Efficient algorithms for large-scale local triangle counting. ACM Trans. Knowl. Discov. Data 4(3), 13 (2010)CrossRef Becchetti, L., Boldi, P., Castillo, C., Gionis, A.: Efficient algorithms for large-scale local triangle counting. ACM Trans. Knowl. Discov. Data 4(3), 13 (2010)CrossRef
7.
Zurück zum Zitat Brown, P.G., Haas, P.J.: Techniques for warehousing of sample data. In: ICDE, pp. 6–6 (2006) Brown, P.G., Haas, P.J.: Techniques for warehousing of sample data. In: ICDE, pp. 6–6 (2006)
8.
Zurück zum Zitat De Stefani, L., Epasto, A., Riondato, M., Upfal, E.: Trièst: Counting local and global triangles in fully dynamic streams with fixed memory size. ACM Trans. Knowl. Discov. Data 11(4), 43 (2017)CrossRef De Stefani, L., Epasto, A., Riondato, M., Upfal, E.: Trièst: Counting local and global triangles in fully dynamic streams with fixed memory size. ACM Trans. Knowl. Discov. Data 11(4), 43 (2017)CrossRef
9.
Zurück zum Zitat Eckmann, J.P., Moses, E.: Curvature of co-links uncovers hidden thematic layers in the world wide web. PNAS 99(9), 5825–5829 (2002)MathSciNetCrossRef Eckmann, J.P., Moses, E.: Curvature of co-links uncovers hidden thematic layers in the world wide web. PNAS 99(9), 5825–5829 (2002)MathSciNetCrossRef
11.
Zurück zum Zitat Etemadi, R., Lu, J., Tsin, Y.H.: Efficient estimation of triangles in very large graphs. In: CIKM, pp. 1251–1260 (2016) Etemadi, R., Lu, J., Tsin, Y.H.: Efficient estimation of triangles in very large graphs. In: CIKM, pp. 1251–1260 (2016)
12.
Zurück zum Zitat Gehrke, J., Ginsparg, P., Kleinberg, J.: Overview of the 2003 kdd cup. ACM SIGKDD Explor. Newsl. 5(2), 149–151 (2003)CrossRef Gehrke, J., Ginsparg, P., Kleinberg, J.: Overview of the 2003 kdd cup. ACM SIGKDD Explor. Newsl. 5(2), 149–151 (2003)CrossRef
13.
Zurück zum Zitat Gemulla, R., Lehner, W., Haas, P.J.: Maintaining bounded-size sample synopses of evolving datasets. VLDB J. 17(2), 173–201 (2008)CrossRef Gemulla, R., Lehner, W., Haas, P.J.: Maintaining bounded-size sample synopses of evolving datasets. VLDB J. 17(2), 173–201 (2008)CrossRef
14.
Zurück zum Zitat Hall, B.H., Jaffe, A.B., Trajtenberg, M.: The nber patent citation data file: Lessons, insights and methodological tools. Tech. rep., National Bureau of Economic Research (2001) Hall, B.H., Jaffe, A.B., Trajtenberg, M.: The nber patent citation data file: Lessons, insights and methodological tools. Tech. rep., National Bureau of Economic Research (2001)
15.
Zurück zum Zitat Han, G., Sethu, H.: Edge sample and discard: a new algorithm for counting triangles in large dynamic graphs. In: ASONAM, pp. 44–49 (2017) Han, G., Sethu, H.: Edge sample and discard: a new algorithm for counting triangles in large dynamic graphs. In: ASONAM, pp. 44–49 (2017)
16.
Zurück zum Zitat Hu, X., Tao, Y., Chung, C.W.: I/O-efficient algorithms on triangle listing and counting. ACM Trans. Database Syst. 39(4), 27 (2014)MathSciNetCrossRef Hu, X., Tao, Y., Chung, C.W.: I/O-efficient algorithms on triangle listing and counting. ACM Trans. Database Syst. 39(4), 27 (2014)MathSciNetCrossRef
17.
Zurück zum Zitat Jha, M., Seshadhri, C., Pinar, A.: A space efficient streaming algorithm for triangle counting using the birthday paradox. In: KDD, pp. 589–597 (2013) Jha, M., Seshadhri, C., Pinar, A.: A space efficient streaming algorithm for triangle counting using the birthday paradox. In: KDD, pp. 589–597 (2013)
18.
Zurück zum Zitat Jung, M., Lim, Y., Lee, S., Kang, U.: Furl: Fixed-memory and uncertainty reducing local triangle counting for multigraph streams. Data Min. Knowl. Discov. 33(5), 1225–1253 (2019)MathSciNetCrossRef Jung, M., Lim, Y., Lee, S., Kang, U.: Furl: Fixed-memory and uncertainty reducing local triangle counting for multigraph streams. Data Min. Knowl. Discov. 33(5), 1225–1253 (2019)MathSciNetCrossRef
19.
Zurück zum Zitat Kallaugher, J., Price, E.: A hybrid sampling scheme for triangle counting. In: SODA, pp. 1778–1797 (2017) Kallaugher, J., Price, E.: A hybrid sampling scheme for triangle counting. In: SODA, pp. 1778–1797 (2017)
20.
Zurück zum Zitat Kim, J., Han, W.S., Lee, S., Park, K., Yu, H.: Opt: A new framework for overlapped and parallel triangulation in large-scale graphs. In: SIGMOD, pp. 637–648 (2014) Kim, J., Han, W.S., Lee, S., Park, K., Yu, H.: Opt: A new framework for overlapped and parallel triangulation in large-scale graphs. In: SIGMOD, pp. 637–648 (2014)
21.
Zurück zum Zitat Klimt, B., Yang, Y.: Introducing the enron corpus. In: CEAS (2004) Klimt, B., Yang, Y.: Introducing the enron corpus. In: CEAS (2004)
22.
Zurück zum Zitat Knuth, D.E.: Seminumerical algorithms. Art Comput. Program. 2, 139–140 (1997) Knuth, D.E.: Seminumerical algorithms. Art Comput. Program. 2, 139–140 (1997)
23.
Zurück zum Zitat Kolountzakis, M.N., Miller, G.L., Peng, R., Tsourakakis, C.E.: Efficient triangle counting in large graphs via degree-based vertex partitioning. In: WAW, pp. 15–24 (2010) Kolountzakis, M.N., Miller, G.L., Peng, R., Tsourakakis, C.E.: Efficient triangle counting in large graphs via degree-based vertex partitioning. In: WAW, pp. 15–24 (2010)
24.
Zurück zum Zitat Kutzkov, K., Pagh, R.: On the streaming complexity of computing local clustering coefficients. In: WSDM, pp. 677–686 (2013) Kutzkov, K., Pagh, R.: On the streaming complexity of computing local clustering coefficients. In: WSDM, pp. 677–686 (2013)
25.
Zurück zum Zitat Kutzkov, K., Pagh, R.: Triangle counting in dynamic graph streams. In: SWAT, pp. 306–318 (2014) Kutzkov, K., Pagh, R.: Triangle counting in dynamic graph streams. In: SWAT, pp. 306–318 (2014)
26.
Zurück zum Zitat Leskovec, J., Kleinberg, J., Faloutsos, C.: Graph evolution: densification and shrinking diameters. ACM Trans. Knowl. Discov. Data 1(1), 2 (2007)CrossRef Leskovec, J., Kleinberg, J., Faloutsos, C.: Graph evolution: densification and shrinking diameters. ACM Trans. Knowl. Discov. Data 1(1), 2 (2007)CrossRef
27.
Zurück zum Zitat Lim, Y., Kang, U.: Mascot: Memory-efficient and accurate sampling for counting local triangles in graph streams. In: KDD, pp. 685–694 (2015) Lim, Y., Kang, U.: Mascot: Memory-efficient and accurate sampling for counting local triangles in graph streams. In: KDD, pp. 685–694 (2015)
28.
Zurück zum Zitat McPherson, M., Smith-Lovin, L., Cook, J.M.: Birds of a feather: homophily in social networks. Annu. Rev. Sociol. 27(1), 415–444 (2001)CrossRef McPherson, M., Smith-Lovin, L., Cook, J.M.: Birds of a feather: homophily in social networks. Annu. Rev. Sociol. 27(1), 415–444 (2001)CrossRef
29.
Zurück zum Zitat Mislove, A.: Online social networks: measurement, analysis, and applications to distributed information systems. Ph.D. thesis, Rice University (2009) Mislove, A.: Online social networks: measurement, analysis, and applications to distributed information systems. Ph.D. thesis, Rice University (2009)
31.
Zurück zum Zitat Park, H.M., Myaeng, S.H., Kang, U.: Pte: Enumerating trillion triangles on distributed systems. In: KDD, pp. 1115–1124 (2016) Park, H.M., Myaeng, S.H., Kang, U.: Pte: Enumerating trillion triangles on distributed systems. In: KDD, pp. 1115–1124 (2016)
32.
Zurück zum Zitat Park, H.M., Silvestri, F., Kang, U., Pagh, R.: Mapreduce triangle enumeration with guarantees. In: CIKM, pp. 1739–1748 (2014) Park, H.M., Silvestri, F., Kang, U., Pagh, R.: Mapreduce triangle enumeration with guarantees. In: CIKM, pp. 1739–1748 (2014)
33.
Zurück zum Zitat Pavan, A., Tangwongsan, K., Tirthapura, S., Wu, K.L.: Counting and sampling triangles from a graph stream. PVLDB 6(14), 1870–1881 (2013) Pavan, A., Tangwongsan, K., Tirthapura, S., Wu, K.L.: Counting and sampling triangles from a graph stream. PVLDB 6(14), 1870–1881 (2013)
34.
Zurück zum Zitat Seshadhri, C., Pinar, A., Kolda, T.G.: Triadic measures on graphs: the power of wedge sampling. In: SDM, pp. 10–18 (2013) Seshadhri, C., Pinar, A., Kolda, T.G.: Triadic measures on graphs: the power of wedge sampling. In: SDM, pp. 10–18 (2013)
35.
Zurück zum Zitat Shin, K.: Wrs: Waiting room sampling for accurate triangle counting in real graph streams. In: ICDM, pp. 1087–1092 (2017) Shin, K.: Wrs: Waiting room sampling for accurate triangle counting in real graph streams. In: ICDM, pp. 1087–1092 (2017)
36.
Zurück zum Zitat Shin, K., Eliassi-Rad, T., Faloutsos, C.: Patterns and anomalies in k-cores of real-world graphs with applications. Knowl. Inf. Syst. 54(3), 677–710 (2018)CrossRef Shin, K., Eliassi-Rad, T., Faloutsos, C.: Patterns and anomalies in k-cores of real-world graphs with applications. Knowl. Inf. Syst. 54(3), 677–710 (2018)CrossRef
37.
Zurück zum Zitat Shin, K., Hammoud, M., Lee, E., Oh, J., Faloutsos, C.: Tri-fly: Distributed estimation of global and local triangle counts in graph streams. In: PAKDD, pp. 651–663 (2018) Shin, K., Hammoud, M., Lee, E., Oh, J., Faloutsos, C.: Tri-fly: Distributed estimation of global and local triangle counts in graph streams. In: PAKDD, pp. 651–663 (2018)
38.
Zurück zum Zitat Shin, K., Kim, J., Hooi, B., Faloutsos, C.: Think before you discard: accurate triangle counting in graph streams with deletions. In: ECML/PKDD, pp. 141–157 (2018) Shin, K., Kim, J., Hooi, B., Faloutsos, C.: Think before you discard: accurate triangle counting in graph streams with deletions. In: ECML/PKDD, pp. 141–157 (2018)
39.
Zurück zum Zitat Shin, K., Lee, E., Oh, J., Hammoud, M., Faloutsos, C.: Cocos: Fast and accurate distributed triangle counting in graph streams. arXiv preprint arXiv:1802.04249 (2018) Shin, K., Lee, E., Oh, J., Hammoud, M., Faloutsos, C.: Cocos: Fast and accurate distributed triangle counting in graph streams. arXiv preprint arXiv:​1802.​04249 (2018)
40.
Zurück zum Zitat Shin, K., Oh, S., Kim, J., Hooi, B., Faloutsos, C.: Fast, accurate and provable triangle counting in fully dynamic graph streams. ACM Trans. Knowl. Discov. Data 14(2), 1–39 (2020)CrossRef Shin, K., Oh, S., Kim, J., Hooi, B., Faloutsos, C.: Fast, accurate and provable triangle counting in fully dynamic graph streams. ACM Trans. Knowl. Discov. Data 14(2), 1–39 (2020)CrossRef
41.
Zurück zum Zitat Shun, J., Tangwongsan, K.: Multicore triangle computations without tuning. In: ICDE, pp. 149–160 (2015) Shun, J., Tangwongsan, K.: Multicore triangle computations without tuning. In: ICDE, pp. 149–160 (2015)
42.
Zurück zum Zitat Spearman, C.: The proof and measurement of association between two things. Am. J. Psychol. 15(1), 72–101 (1904)CrossRef Spearman, C.: The proof and measurement of association between two things. Am. J. Psychol. 15(1), 72–101 (1904)CrossRef
43.
Zurück zum Zitat Suri, S., Vassilvitskii, S.: Counting triangles and the curse of the last reducer. In: WWW, pp. 607–614 (2011) Suri, S., Vassilvitskii, S.: Counting triangles and the curse of the last reducer. In: WWW, pp. 607–614 (2011)
44.
Zurück zum Zitat Tsourakakis, C.E.: Fast counting of triangles in large real networks without counting: algorithms and laws. In: ICDM, pp. 608–617 (2008) Tsourakakis, C.E.: Fast counting of triangles in large real networks without counting: algorithms and laws. In: ICDM, pp. 608–617 (2008)
45.
Zurück zum Zitat Tsourakakis, C.E., Kang, U., Miller, G.L., Faloutsos, C.: Doulion: counting triangles in massive graphs with a coin. In: KDD, pp. 837–846 (2009) Tsourakakis, C.E., Kang, U., Miller, G.L., Faloutsos, C.: Doulion: counting triangles in massive graphs with a coin. In: KDD, pp. 837–846 (2009)
46.
Zurück zum Zitat Turk, A., Türkoglu, D.: Revisiting wedge sampling for triangle counting. In: TheWebConf, pp. 1875–1885 (2019) Turk, A., Türkoglu, D.: Revisiting wedge sampling for triangle counting. In: TheWebConf, pp. 1875–1885 (2019)
47.
Zurück zum Zitat Türkoglu, D., Turk, A.: Edge-based wedge sampling to estimate triangle counts in very large graphs. In: ICDM, pp. 455–464 (2017) Türkoglu, D., Turk, A.: Edge-based wedge sampling to estimate triangle counts in very large graphs. In: ICDM, pp. 455–464 (2017)
48.
Zurück zum Zitat Viswanath, B., Mislove, A., Cha, M., Gummadi, K.P.: On the evolution of user interaction in facebook. In: WOSN, pp. 37–42 (2009) Viswanath, B., Mislove, A., Cha, M., Gummadi, K.P.: On the evolution of user interaction in facebook. In: WOSN, pp. 37–42 (2009)
50.
Zurück zum Zitat Wang, P., Qi, Y., Sun, Y., Zhang, X., Tao, J., Guan, X.: Approximately counting triangles in large graph streams including edge duplicates with a fixed memory usage. PVLDB 11(2), 162–175 (2017) Wang, P., Qi, Y., Sun, Y., Zhang, X., Tao, J., Guan, X.: Approximately counting triangles in large graph streams including edge duplicates with a fixed memory usage. PVLDB 11(2), 162–175 (2017)
51.
Zurück zum Zitat Wasserman, S., Faust, K.: Social Network Analysis: Methods and Applications, vol. 8. Cambridge University Press, Cambridge (1994)CrossRef Wasserman, S., Faust, K.: Social Network Analysis: Methods and Applications, vol. 8. Cambridge University Press, Cambridge (1994)CrossRef
52.
Zurück zum Zitat Watts, D.J., Strogatz, S.H.: Collective dynamics of small-worldnetworks. Nature 393(6684), 440–442 (1998)CrossRef Watts, D.J., Strogatz, S.H.: Collective dynamics of small-worldnetworks. Nature 393(6684), 440–442 (1998)CrossRef
Metadaten
Titel
Temporal locality-aware sampling for accurate triangle counting in real graph streams
verfasst von
Dongjin Lee
Kijung Shin
Christos Faloutsos
Publikationsdatum
12.08.2020
Verlag
Springer Berlin Heidelberg
Erschienen in
The VLDB Journal / Ausgabe 6/2020
Print ISSN: 1066-8888
Elektronische ISSN: 0949-877X
DOI
https://doi.org/10.1007/s00778-020-00624-7

Weitere Artikel der Ausgabe 6/2020

The VLDB Journal 6/2020 Zur Ausgabe